English
  • chatgpt
  • ai
  • prompt
  • i18n

Long-context JSON translation with ChatGPT

Learn how to use the latest ChatGPT model and JSON mode to translate a JSON object with long context and stream the output back to JSON.

Gao
Gao
Founder

Learn how to use the latest ChatGPT model and JSON mode to translate a JSON file with long context and stream the output back to JSON.

Introduction

It has been a while since our last post Efficient internationalization with ChatGPT, where we showed how to use ChatGPT to translate a JSON object. As the model has evolved, the translation capabilities have been significantly improved with a longer context window, increased max output tokens, and new features like JSON mode that makes the developer experience even better.

Increased max output tokens

Let's see a quick comparison of the last two versions of the model:

ModelDescriptionContext windowMax output tokensTraining data
gpt-4o-2024-05-13gpt-4o currently points to this version.128,000 tokens4,096 tokensUp to Oct 2023
gpt-4o-2024-08-06Latest snapshot that supports Structured Outputs128,000 tokens16,384 tokensUp to Oct 2023

The most significant change is the increase in the maximum output tokens from 4,096 to 16,384, which suits the translation scenario perfectly since the output of the translation usually has about the same length or more than the input. Meanwhile, the price even went down, compared to the previous 32K token model (it was expensive).

JSON mode

As the name suggests, JSON mode is very useful when you want to ensure that the output is a valid JSON object. For the new model, you can even define the schema of the output JSON object using Structured Outputs.

Get started

Initialization

We'll use OpenAI's Node.js SDK v4.56.0 in this post for demonstration. Feel free to use any other method or language you prefer.

First, we need to initialize the OpenAI client:

I strongly recommend using the streaming mode for long-context translation, as it's more efficient and you doesn't need to wait for a long time to get the output (for example, one minute for one request). The code will look like:

Let's write some prompts

We can use the system role to instruct the model for its job:

Remember to replace "es" with your target locale code. According to our experience, there are several instructions that we need to tell the model for better output and less human intervention:

You may wonder why we need the last instruction. This is the explanation from OpenAI:

When using JSON mode, you must always instruct the model to produce JSON via some message in the conversation, for example via your system message. If you don't include an explicit instruction to generate JSON, the model may generate an unending stream of whitespace and the request may run continually until it reaches the token limit.

With the above instructions, we can put the JSON object to be translated in the following message with the user role. Let's compose what we have so far:

Handling the output

Since we are using the streaming mode, we need to handle the output with the streaming style. Here is an example:

Conclusion

That's it! With the new ChatGPT model, the efficiency of JSON translation has been further improved, and the translation quality is also better. I hope this post helps you to gain some new insights on ChatGPT and JSON translation. See you in the next post!