Why is OpenAI more expensive in Dutch than in English?

It's clear that OpenAI is not free. But how does the pricing model work and what are you charged for? It's important to realize that there's a difference in which language (no, not programming language, but spoken language) you use in your project. This definitely has an impact on usage costs.

What are tokens?

Tokens in OpenAI refer to pieces of text that the model reads, which determine the model's 'vocabulary'. A token can be as short as one character or as long as one word. Common words are not split up, and rarer words are converted to meaningful subwords. The concept of tokens is central to the operation of many modern machine learning models for text processing. Before text is fed to the model, it is first converted to tokens. This means that the entire text is broken down into these individual tokens. The number of tokens you then feed to the model determines the usage costs. So; the more tokens you send to the model, the more expensive it becomes.


The tokenizer used for OpenAI's GPT models is optimized for the English language. This means that, for instance because the tokenizer is based on more English texts, it splits English sentences into larger parts because the words in these sentences appear more frequently in the tokenizer's training data.

This doesn't mean that the model is worse at interpreting Dutch text input, but it mainly speaks to its efficiency and that it has simply been trained most extensively on English.

What does this look like in practice?

To illustrate the difference between a Dutch text and an English text that have been converted into tokens, we have highlighted two texts with colors to show the difference. Each colored block is one token:

An English text tokenized, visualized with colors.
A Dutch text tokenized, visualized with colors.

Are tokens the only thing that applies to costs?

Basically, tokens are what a price of a request to the model is based on. However, how OpenAI works is that during a session or chat conversation, the context of the conversation is also tracked. Every time a chat requires another message exchange, the entire built-up context from that session is included again in both cases (i.e. both the question and the answer). It is therefore important that OpenAI comes to a good answer as quickly as possible. Not only because of the user experience, but also the costs.

What can (open)AI do for my organization?

AI has many applications, the best known of which is ChatGPT. The power of AI in your organization depends on the quality of your data. Within a software research we always look at all possibilities. We would be happy to discuss with you what options are available for your organization!

Bram Wenting

Bram Wenting is co-owner of SST Software and SST Labs. Read his blogs.

