index

Gemini only likes ASCII

Recently, the team was migrating some LLM workloads to Gemini. We had rolled it out to 25% of our users and noticed something very strange. Roughly 3% of the time, Gemini (both gemini-pro-2.5 and gemini-flash-2.5) would produce invalid JSON with structured output enabled. Looking closer, it turns out Gemini would get stuck in a loop of generating either \t or \n tokens until max tokens were reached.

For example,

"{"food": "Trader Joe\n\n\n\n\n\n\n\n\n\n\n\n\n\n.....

Indeed, this bug seems to have existed at least as early as December 2024 in gemini-1.5-pro.

The root cause isn’t 100% clear or reproducible but we noticed that if the following things were true, Gemini would reliably get into this state

  1. Structured output was enabled
  2. Non-ASCII character present somewhere in the prompt

For our specific app, this usually happened when customers

  1. Tapped iOS keyboard suggestions for brand names like McDonald's , which would insert a character.
  2. Copy-pasted recipes or nutrition information that contained non-ASCII characters like bullet points .
  3. Typed any special characters because we have customers globally.

The Fix

This may not work for you. If you absolutely need non-ASCII characters, then your only option is to not use structured output. It’s time to go back to the LLM stone-age and tell Gemini to output in a specific format and write robust parsing logic.

You may be able to stream tokens from Gemini and detect when it gets into this state, and then perform a retry. However, for our use case, this did not work:

  1. If a Gemini request would end up in this state, it is likely to end up on this state again
  2. The retries add way too much latency for our real-time use case. Additionally, if it was likely to end up in the same state, we would continue retrying.

System Instructions

If you only need ASCII characters, don’t forget to check all system instructions are ASCII. Depending on which tool you used to author them, copy/pasting could introduce characters like or .

Ask your favourite AI to write a script that will scan all code for non-ASCII characters to verify you caught them all.

User Input

Depending on your use case, the solution may be as easy as ours. I vibe-coded a TypeScript function with some tests to normalize user’s input. Your normalization function may be different for your use case but this should cover most.

You can see it here.

Conclusion

After the fix, we’ve had 0 issues and onboarded 100% of our customers to the new Gemini workloads!