Valérian de Thézan de Gaussan · Data Engineering for process-heavy organizations

OpenAI bill is too high?

You can save 50% by using the batch API.


The GPT models can be quite expensive once they start getting used a lot in your organization. Especially GPT-4, which, at the time I write this article, cost 10$ for 1 million input token, and 30$ for 1 million output tokens.

If you are using these models in batch jobs, where you don’t require the response right away, there is a way to make you save 50% of your bill.

Here’s how:

Introduction to OpenAI Batch Python API

The Batch API is designed for bulk processing of tasks. It’s particularly useful when you have a large dataset that you need to process in a cost-effective way, without having the need for the response at query time. By using the Batch API, you send a batch of prompts in one request, and come back later to fetch your results.

Setting Up

To start using the OpenAI Batch API, you need to set up your Python environment:

  1. Install the OpenAI Python package: If not already installed, you can install it using pip:
     pip install openai
    
  2. Ensure you have your API key from OpenAI. This key is required to authenticate your requests.
     import openai
     client = openai.OpenAI(api_key='your-api-key')
    
  3. You need a list of tasks that you want to process. Each task follows this template:
     task = {
         "custom_id": f"task-{task_id}",
         "method": "POST",
         "url": "/v1/chat/completions",
         "body": {
             "model": "gpt-3.5-turbo",
             "temperature": 0.1,
             "response_format": { 
                 "type": "json_object"
             },
             "messages": [
                 {
                     "role": "system",
                     "content": "system prompt"
                 },
                 {
                     "role": "user",
                     "content": "user prompt"
                 }
             ],
         }
     }
    

    Customize this template following the Completion API

  4. Put all of these tasks into a json-line file.
     file_name = "batch_file.jsonl"
    
     with open(file_name, 'w') as file:
        for task in tasks:
           file.write(json.dumps(task) + '\n')
    
  5. Upload the file to OpenAI
    batch_file = client.files.create(
       file=open(file_name, "rb"),
       purpose="batch"
    )
    
  6. Create the batch job
    batch_job = client.batches.create(
       input_file_id=batch_file.id,
       endpoint="/v1/chat/completions",
       completion_window="24h"
    )
    
  7. Wait 24h. Well actually, a bit less, as batch are usually processed faster than that. So you can poll from time to time the API.
    batch_job = client.batches.retrieve(batch_job.id)
    print(batch_job.status)  # If it is "completed", then it's done.
    
  8. Finally, retrieve the results:
    result_file_id = batch_job.output_file_id
    result = client.files.content(result_file_id).content  # This will be in the json-line format.
       
    # I advise you to then dump that to a file.
    result_file_name = "results.jsonl"
    
    with open(result_file_name, 'wb') as file:
       file.write(result)
    

Final note: results are unordered and will not necessarily match your input batch order.

Tips for Further Cost Savings

  • Efficient prompt design can reduce the number of tokens processed, thereby lowering costs.
  • Keep an eye on your API usage to understand your usage patterns, as it can help you optimize further.
  • Use models that are best suited for your task but also cost-effective. Sometimes, a simpler model may suffice. As a comparison, GPT-4 is 20x expensive than GPT-3.5.

For more details, you can always refer to the official OpenAI API documentation: OpenAI Batch API Reference