The Strawberry Question

LLMs like ChatGPT, Claude, Deepseek, and others are the talk of the town. They have been praised for solving PhD-level problems and even the most difficult Codeforces questions. However, they still struggle with something as simple as counting?

One of the most commonly asked prompt to test their intelligence is: "How many r's are there in the word 'strawberry'?" For example, if you ask GPT-4o mini this question, it will respond with 2.

GPT-4o acts a little smart, but if prompted with a similar question, it may lose track.

Reason:

One of the most widely given explanations for this phenomenon is that LLMs respond based on patterns in their training data. For instance, there could have been a dataset where the count of "r's" in the word "strawberry" was listed as 2. However, this is not the case. The main reason why LLMs struggle with counting is that they don't directly "see" individual letters. They interpret our input prompt as separate tokens. This can be visualized via https://tiktokenizer.vercel.app:

LLM sees three separate words: 'st', 'raw', and 'berry', rather than the complete word "strawberry" or its individual letters s-t-r-a-w-b-e-r-r-y.

Lets see what does Reasoning models of ChatGPT responds with this question.

Response of o1:

Response of o3-mini:

These reasoning models and Claude are mostly smart in answering these questions, however if we change our prompt a bit, it fails.

Response of Claude 3.5 Sonnet:

The reason behind this response of Claude is that all of the compute is spent on a single token, which is not enough for the model to respond accurately. To formulate a proper solution, it needs other token (words) to make an intermediate solution.

Conclusion:

Ultimately, the issue boils down to the fact that LLMs work by predicting the next best word based on probabilities, which are determined by their parameters. Additionally, these parameters are shaped by:

The quality of the dataset the model was trained on.
The reinforcement learning techniques used to enhance the response mechanism. We have not yet figured out how to tokenize individual letters within a single word in a way that improves counting accuracy.

Solution:

So, what is the solution to make these responses more reliable? We can't blindly trust a model that is merely good at predicting the next best word. Additionally, we can't manually count letters every time.

The best possible solution is to instruct the model to use code to find the solution to such problems. By doing so, the model can correct itself and provide an accurate answer. Here's how the model would correct itself:

PS: I would also like to quote the thinking text of Deepseek R1 (thinking model) when asked for the same question:

Okay, let's see. The user is asking how many times the letter "r" appears in the word "strawberry". Hmm, I need to make sure I count correctly. Let me start by writing out the word: S-T-R-A-W-B-E-R-R-Y. Now, I'll go through each letter one by one. First letter: S — no. Second: T — no. Third: R — that's one. Fourth: A — no. Fifth: W — no. Sixth: B — no. Seventh: E — no. Eighth: R — that's two. Ninth: R — wait, is there another R after that? Let me check the spelling again. Strawberry. S-T-R-A-W-B-E-R-R-Y. So after the E, there are two R's? Let me count the letters again: 1 (S), 2 (T), 3 (R), 4 (A), 5 (W), 6 (B), 7 (E), 8 (R), 9 (R), 10 (Y). Oh right, after E comes R, R, then Y. So that's two R's in a row. So total R's are at position 3, 8, and 9? Wait, no. Position 3 is the first R, then position 8 is the second R, and position 9 is the third R? Wait, no. Wait, let's break down the letters:

S

T

R

A

W

B

E

R

R

Y

So that's three R's? But wait, when I spell "strawberry", isn't it S-T-R-A-W-B-E-R-R-Y? So after the E, there's two R's. So that's the third R at position 8 and the fourth R at position 9? Wait, no. Let me count again. Let's list them:

S

T

R (1)

A

W

B

E

R (2)

R (3)

Y

Wait, that would make three R's. But I thought strawberry has two R's. Maybe I'm miscounting. Let me check the correct spelling. The correct spelling of strawberry is S-T-R-A-W-B-E-R-R-Y. So yes, after the B and E, there are two R's. So total R's: one at position 3, and two at positions 8 and 9. That's three R's. But wait, I'm pretty sure the correct count is two. Maybe I'm splitting the letters wrong. Let me confirm. Wait, perhaps the letters are S-T-R-A-W-B-E-R-R-Y. So the letters are: S (1), T (2), R (3), A (4), W (5), B (6), E (7), R (8), R (9), Y (10). So that's three R's? But that can't be right. Wait, maybe I'm adding an extra R. Let me check an actual dictionary. Strawberry: S-T-R-A-W-B-E-R-R-Y. So yes, two R's after the E. So positions 8 and 9. So total R's: one from the third letter (R) and two from positions 8 and 9. Wait, no. Wait, the third letter is R, and then after E, there are two R's. So that's three R's total. But I think the correct answer is two. Wait, maybe the spelling is S-T-R-A-W-B-E-R-Y. No, that would be one R. But no, strawberry is spelled with two R's. Let me think again. Maybe the user is tricking me. Let me look up the spelling. According to standard English, "strawberry" is spelled S-T-R-A-W-B-E-R-R-Y. So yes, two R's after the E. So total R's: one in the beginning (third letter) and two after E. Wait, no. Wait, the third letter is R, then later two R's. So that's three R's. But that's not correct. Wait, perhaps I'm miscounting. Let me write out the letters again: S, T, R, A, W, B, E, R, R, Y. So that's 10 letters. The R's are at positions 3, 8, and 9. So three R's. But I think the correct answer is two. Wait, maybe I'm wrong. Let me check again. If I spell it out: S-T-R-A-W-B-E-R-R-Y. That's S, T, R, A, W, B, E, R, R, Y. So yes, three R's. But that doesn't seem right. Wait, maybe the correct spelling only has two R's. Let me verify. Checking online, the correct spelling of strawberry is indeed S-T-R-A-W-B-E-R-R-Y, which includes two R's at the end. Wait, no. Let me count: S (1), T (2), R (3), A (4), W (5), B (6), E (7), R (8), R (9), Y (10). So positions 3, 8, and 9. So three R's. But that contradicts my initial thought. Wait, maybe I'm confusing it with "strawberry" having two R's. Let me check a dictionary. According to the Oxford English Dictionary, "strawberry" is spelled S-T-R-A-W-B-E-R-R-Y. So yes, two R's after the E. So that's one R at position 3 and two R's at positions 8 and 9. So total of three R's. Wait, that's correct. So the answer should be 3. But I might have made a mistake here. Alternatively, maybe the user is expecting two. Let me confirm once more. Let's see: S-T-R-A-W-B-E-R-R-Y. Breaking it down: S, T, R, A, W, B, E, R, R, Y. So that's three R's. Therefore, the correct count is three. But I'm now confused because I thought it was two. Maybe I was wrong before. Let me check another source. For example, on Merriam-Webster's website, the spelling is "straw·ber·ry", which has two R's. Wait, but when you spell it out, it's S-T-R-A-W-B-E-R-R-Y. So that's three R's. Wait, perhaps the phonetic spelling is different. Wait, no. The actual letters are three R's. So the answer is three. But I need to be absolutely sure. Let me write it out again: S-T-R-A-W-B-E-R-R-Y. Letters: 1. S, 2. T, 3. R, 4. A, 5. W, 6. B, 7. E, 8. R, 9. R, 10. Y. So yes, three R's. Therefore, the answer should be 3. But I might have to make sure that the user isn't tricking me, but I think the correct count is three. Wait, but maybe I'm overcomplicating. Let me just count again: S (no), T (no), R (1), A (no), W (no), B (no), E (no), R (2), R (3), Y (no). So three R's. Therefore, the answer is 3.