Fine-tune OpenAI's GPT-4o mini LLM AI model for free

blog preview

Fine-tuning GPT-4o mini offers immense potential for businesses and developers looking to customize AI applications. As an advanced and cost-effective model, GPT-4o mini excels in tasks requiring textual intelligence, making it a prime candidate for a customized implementations.

By fine-tuning this model, you can enhance its performance to better align with your specific use cases, such as improving customer interactions, generating more accurate responses, and developing specialized functionalities.

This blog will guide you through the steps and best practices for effectively fine-tuning GPT-4o mini, ensuring you maximize its capabilities for your unique needs.

And the best part: Fine-tuning GPT-4o mini is free! At least until 2024-09-23.

Why GPT-4o mini?

The GPT-4o mini is a compact AI language model that has been developed as an alternative to larger, more resource-intensive models. It's designed to strike a balance between performance and cost-effectiveness, aiming to meet the needs of developers and businesses who require AI capabilities but may face budget or computational constraints.

  • Cost Considerations: GPT-4o mini may offer reduced computational costs compared to larger models. This could make it a viable option for businesses and developers working with limited budgets, though exact savings would depend on specific use cases and implementation. In numbers, GPT-4o mini is 66x (!!!) cheaper than the original GPT-4 and 33x cheaper than GPT-4o.

  • Performance: While smaller than some alternatives, GPT-4o mini aims to perform adequately on common language tasks. Its capabilities in areas like natural language processing and function calling should be evaluated against specific project requirements. Various benchmarks show GPT-4o mini being on par with the original GPT-4 model - which is just incredible.

  • Adaptability: Like many language models, GPT-4o mini can be fine-tuned for specific applications. This allows the model to potentially outperform much larger and more expensive models for specific use-cases.

How much costs fine-tuning GPT-4o mini?

Fine-tuning GPT-4o mini is free until 2024-09-23. After this date, the costs are 3$ per 1 million training tokens.

Using a fine-tuned GPT-4o model is also quite cost efficient:

  • 0.3$ per 1 million tokens for input tokens
  • 1.2$ per 1 million tokens for output tokens

All in all, fine-tuning an LLM model has never been that cheap.

How to fine-tune GPT-4o mini

Enough theory, let's get started with the actual fine-tuning process.

Format of dataset for fine-tuning GPT-4o mini

Creating a high-quality training set in the right format is most important for a successful fine-tuning run. The quality and relevance of your training data significantly influence the performance of your fine-tuned model. For the OpenAI API, the data must be formatted in JSONL (JSON Lines) format.

JSONL is a text-based format where each line represents a separate, valid JSON object. This structure makes it efficient for processing large datasets line by line.

The content of the JSONL itself is rather simple: You just need prompt/answer pairs (for text completion models (like davinci-002)) or arrays of messages with roles "user" and "assistant" (as known from the OpenAI API request payload) for chat models (like got-4o-mini)

Example of JSONL format for text completion models:

1{"prompt": "Summarize the main points of the article:", "completion": "The article discusses..."}
2{"prompt": "Translate the following sentence to French:", "completion": "La traduction est..."}
3{"prompt": "What are the key features of Python?", "completion": "Python's key features include..."}

For this guide, we use a chat model and therefore need to create the chat model format. For demonstration purposes, we'll use a small dataset with just a few examples. The data consists of customer service inquiries and appropriate responses.

Sample data for this guide and exemplary for fine-tuning OpenAI chat models:

1{
2 "messages": [{"role": "system", "content": "You are a customer service representative for an online bookstore."},
3 {"role": "user", "content": "How long does shipping usually take?"},
4 {"role": "assistant", "content": "Shipping times typically range from 3-5 business days for standard delivery. Express shipping options are available for faster delivery."}]
5}
6{
7 "messages": [{"role": "system", "content": "You are a customer service representative for an online bookstore."},
8 {"role": "user", "content": "Can I return a book if I don't like it?"},
9 {"role": "assistant", "content": "Yes, we offer a 30-day return policy for books in new condition. Please keep the original receipt for your return."}]
10}

Remember, this is a minimal dataset for demonstration purposes. In practice, you'd want a larger, more diverse set of examples to effectively fine-tune your model.

NOTE: These examples are formatted in a way to be easier to read. For the actual dataset, make sure to have each JSON dictionary on a single line. Like so:

1{ "messages": [{"role": "system", "content": "You are a customer service representative for an online bookstore."}, {"role": "user", "content": "Can I return a book if I don't like it?"}, {"role": "assistant", "content": "Yes, we offer a 30-day return policy for books in new condition. Please keep the original receipt for your return."}] }

NOTE 2: You'll need at least 10 examples for fine-tuning.

How much data is needed for LLM fine-tuning?

The amount of data needed for effective LLM fine-tuning can vary widely depending on several factors:

  • Complexity of the task: More complex tasks generally require more data.

  • Desired performance level: Higher performance expectations often necessitate more training data.

  • Base model size and capabilities: Larger models may require more data to fine-tune effectively.

  • Domain specificity: Highly specialized tasks might need more domain-specific examples.

As a general guideline:

  • Minimum: OpenAI recommends at least 100 examples for fine-tuning, but this is often insufficient for complex tasks.

  • Typical range: Many successful fine-tuning projects use between 500 to 10,000 examples.

  • Upper end: Some large-scale projects may use 100,000 or more examples, especially for diverse or complex tasks.

It's important to note that quality often trumps quantity. A smaller dataset of high-quality, diverse examples can sometimes outperform a larger dataset of low-quality or repetitive examples. Additionally, it's crucial to continuously evaluate the model's performance as you increase the dataset size, as there can be diminishing returns beyond a certain point.

Note: This point is immensely important. Previously, in the good old machine learning days, one needed millions of rows of data. LLMs however are different. Even small amounts of data can influence the output of a model. So, make sure to have no (as in zero) wrong data and a good but not enormous amount of high-quality rows.

For optimal results, start with a moderately sized, high-quality dataset and incrementally increase it while monitoring performance improvements. This approach allows you to find the sweet spot between data quantity and model performance for your specific use case.

Step-by-step guide to fine-tuning GPT-4o mini

  1. Navigate to the OpenAI fine-tuning dashboard.

    OpenAI fine-tuning dashboardOpenAI fine-tuning dashboard

  2. Click on "Create". In the next overlay, set the following settings:

    • Select gpt-4o-mini as the base model.
    • Training data: Upload your jsonl file, as specified in the chapter above.
    • Select "None" for the validation data. If you have a separate validation dataset, you can upload it here. The validation dataset uses the same format as the training dataset and is used to evaluate the model's performance during and after training.
    • Suffix: This suffix is added to the model name after fine-tuning. This helps you to identify this specific fine-tuned model later on.
    • Seed: Set it to a specific number, if you want reproducible results. Passing in the same seed and job parameters should produce the same results across various fine-tuning runs.
    • Batch size: Set it to auto for your initial run. You can experiment with different batch sizes later to optimize performance. Batch size however mainly determines the training speed and much less the model performance. As the training runs on OpenAI infrastructure, leaving it at auto is a good choice.
    • Learning rate multiplier: Set it to auto for your initial run. The learning rate multiplier is a hyperparameter that controls how much the model's weights are updated during training. A smaller value prevents over-fitting and might stabilize the training process. However, too small values can slow down the training process. In general there are quite good ways to automatically determine the learning rate, so leaving it at auto is a good choice.

    Fine-tuning settingsFine-tuning settings

  3. Click on "Create". You'll be presented with a running job and a summary of the current stages of the job.

    Fine-tuning jobFine-tuning job

  4. After the job is finished, you'll find a chart of the training loss of the model training process. In general you want the training loss to go down and reach a low settled number at the end. If your training loss is not decreasing, you'll need more/different training data, or you might have to adjust the learning rate.

    Training lossTraining loss

Using a fin-tuned OpenAI model

After completion of the job, at the very top of the jobs page, you'll find the model name. It is similar to ft:gpt-4o-mini-2024-07-18:pondhouse-data-og:customer-service:9oofdDLO.

Head over to the OpenAI Chat Playground, and select your fine-tuned model from the drop-down in the top-left. Use the chat interface to interact with your model and test its capabilities.

One interesting fact about the screenshot above: As you might realize, I asked a question which was similar to a question in my training samples. And I got exactly the answer as provided in the training data. Which, first, shows, that the training process worked and second, that the model very quickly fits to training data. Imagine, if I have only one wrong training sample - the model would take it for a fact. So, make sure to have no wrong data!

OpenAI playground with fine-tuned modelOpenAI playground with fine-tuned model

If you want to use the model in your own application, you can do so by simply using the normal OpenAI SDK:

1from openai import OpenAI
2client = OpenAI()
3
4completion = client.chat.completions.create(
5 model="ft:gpt-4o-mini-2024-07-18:pondhouse-data-og:customer-service:9oofdDLO",
6 messages=[
7 {"role": "system", "content": "You are a customer service representative for an online bookstore. Provide helpful and friendly assistance to customers with their inquiries."},
8 {"role": "user", "content": "I received a damaged book. What should I do?"}
9 ]
10)
11
12print(completion.choices[0].message.content)

When should I use fine-tuning vs embeddings / retrieval augmented generation?

Embeddings with retrieval augmented generation is best suited for cases when you have a large quantity of document and context you want the model to understand.

Generally speaking, don't use fine-tuning if you want to add new knowledge to a model - use RAG instead.

If you want to alter the behaviour, output format or general attitude of the model, fine-tuning is the better approach.

Retrieval strategies are not an alternative to fine-tuning and can in fact be complementary to it.

Conclusion

Fine-tuning GPT-4o mini presents a practical approach for businesses and developers to customize AI capabilities for specific use cases. This guide has walked you through the essential steps of the process, from preparing your dataset to implementing the fine-tuned model.

Key points to remember:

  1. Data quality is crucial. Ensure your training examples are accurate and relevant to your use case.

  2. Start with a moderate amount of high-quality data rather than aiming for large quantities.

  3. The fine-tuning process itself is straightforward using OpenAI's platform.

  4. GPT-4o mini offers a cost-effective alternative to larger models for many applications.

As with any AI implementation, it's important to set realistic expectations and thoroughly test your fine-tuned model. Results can vary depending on your specific use case and the quality of your training data.

While fine-tuning GPT-4o mini can enhance performance for targeted tasks, it's not a magic solution. It requires careful consideration of your objectives, thorough preparation of training data, and ongoing evaluation and refinement.

We encourage you to experiment with this process, keeping in mind both its potential benefits and limitations. The fact that fine-tuning is so cheap nowadays, experiments are much easier and encouraged.

Further Reading

------------------

Interested in how to train your very own Large Language Model?

We prepared a well-researched guide for how to use the latest advancements in Open Source technology to fine-tune your own LLM. This has many advantages like:

  • Cost control
  • Data privacy
  • Excellent performance - adjusted specifically for your intended use
More information on our managed RAG solution?
To Pondhouse AI
More tips and tricks on how to work with AI?
To our Blog