Yesterday, Meta launched its latest AI model, Llama 3.1. At first glance, it appears as an iterative update to the Llama 3 model. However, Meta claims this new model will outperform all current models, including GPT-4 and even Claude 3.5 Sonnet, when it comes to benchmarks. To try it out, we took a dive deep into Meta AI’s Llama 3.1 to check how it stacks up against ChatGPT and Claude.
What Is Meta AI’s Llama 3.1
The ChatGPT used the third-generation Artificial Intelligence-model called GPT or Generative Pre-trained Transformers, which are the language models and frameworks designed to perform a broad range of tasks. Similarly, the AI model behind Meta AI is Llama. For each new version of Llama, Meta typically releases three variants for different purpose. With Llama 3.1, you can choose from models with 8B, 70B, and 405B parameters.
- 8B Parameter: A lightweight, ultra-fast model you can run anywhere.
- 70B Parameter: A balanced model, offering both speed and performance.
- 405B Parameter: A high-capacity model useful for complex tasks.
Llama 3.1 405B model is on par with the GPT 4o and Claude 3.5 Sonnet and even outperforms is categories like Math and long context.
And their 8B and 70B versions outperform Gemma 2 9B and GPT 3.5 Turbo, respectively.
Since Llama is an open-source model, it is completely free for everyone. You can download the model and use it offline without any restrictions. Developers can also integrate Llama 3.1 into their apps for free, as long as their apps have fewer than 70 million users. To put this in perspective, building an AI model with the same capabilities could cost over $100 million.
Key Features of Llama 3.1
- Excells in Benchmarks: Especially in math, reasoning and long context.
- Open Source: Llama 3.1 is free and open-source, unlike other models which have a message limit per day.
- Developer-Friendly: Allows fine-tuning, making it a better choice for developers looking to integrate AI into their apps or websites.
- Security and Privacy: Since Llama 3.1 can be run locally, it provides enhanced privacy and security compared to cloud-based AI models. Sensitive data never has to leave your device, but this applies only when you download the model on your device and run it locally.
Comparing Llama 3.1 With Claude and ChatGPT
I compared the Meta AI (Llama 3.1 405B parameter variant) with ChatGPT (GPT4o) and Claude (3.5 Sonnet) models in various aspects like code generation, speed, reasoning skills, etc. For the 405B version, I used the Hugging Face app, as the Meta AI website uses the 70B parameter model. Here are the results:
1. Code Generation
I asked Meta AI (Llama 3.1 405B variant), ChatGPT 4, and Claude 3.5 Sonnet to create a snake game using Python, including a score system.
Use Pygame library and write the code for snake game in Python, including the score system.
In this first test, Meta’s performance was disappointing compared to ChatGPT and Claude. Meta’s model created code with 3 to 4 naming errors that I had to fix manually. Even after correcting these errors, I couldn’t control the snake using my keyboard inputs. After several attempts to generate and fix the code, I finally got the game to run. But, it still lacked the scoring system.
On the other hand, ChatGPT and Claude produced code that worked without any issues and included the requested scoring system. Claude’s game was the best overall, with smoother controls compared to ChatGPT’s version, which had slightly finicky controls. Overall, Claude is the better AI model for coding because its generated UI is often clean and also provides the option to provide more instructions and improve the code with the help of artifacts feature.
We repeated the coding tests with JavaScript and other languages. While Meta’s output occasionally matched the other models, its code generation was hit or miss. I also tested code generation with the smaller 8B and 70B variants of Llama 3.1, and the experience was worse than expected. The 8B model, in particular, often produced output that got stuck in loops no matter how many times I tried.
2. Writing Stories and Emails
With the release of Claude 3.5 Sonnet, Claude has become the best model for generating human-like text and stories. It still stands out as the top choice for such works.
On the other hand, ChatGPT is good at generating articles, themes, and similar content. Meta’s writing style often feels odd and is difficult to fine-tune with prompts.
However, these preferences can be subjective, so I recommend you try all three models yourself, as you can test them for free. One noteworthy capability of Meta AI is its ability to write 10 sentences ending with a specific word. While this might seem simple, other language models like Claude and ChatGPT struggle to achieve it consistently.
3. Testing Reasoning Skills
Meta AI has outperformed Claude and ChatGPT in benchmarks for the reasoning and long context category. This suggests it should be much better at solving riddles or understanding complex questions. To test this, I provided a few riddles and conducted quizzes on the models. Here’s one example riddle I gave as a prompt:
You are blindfolded and 10 coins are placed in front of you. You can touch, but can't tell which side is up. There are 5 heads and 5 tails. Can you make two piles with the same number of heads? You can flip the coins any number of times.
In our testing, all three services performed similarly.
However, we observed that Meta AI provided accurate answers more often when solving complex math problems compared to the other options. Here’s one example of a functions and graphs question I asked all three models:
Given the function f(x)=2x3−3x2+x−5f(x) = 2x^3 - 3x^2 + x - 5f(x)=2x3−3x2+x−5, find the points where the graph intersects the x-axis.
While other chatbots have successfully solved even complex function problems, Meta AI was the only model that accurately answered the question and also provided detailed steps.
4. Conversational Skills
The biggest downside of Meta AI is the lack of enough conversational abilities. Meta focuses more on creating an open-source language model for developers rather than a consumer-focused AI chatbot. As a result, its tone is often bland and robotic. On the other hand, Claude adopts a more human-like approach, and ChatGPT falls somewhere in between.
However, when it comes to remembering the context of a topic, Meta AI and Claude excel compared to ChatGPT. This becomes evident when providing a series of commands to the AI. While both Claude and Meta AI can follow all the instructions, ChatGPT often forgets older instructions or struggles to incorporate new ones properly.
5. Generating Speed
When it comes to speed, Meta AI undoubtedly takes the crown. Its 8B parameter variant is the fastest AI model, generating results in a split second, whether it’s creating tables, finding information, or generating an email template. This 8B parameter model might be less capable when solving math or coding problems, but it is just as effective as other models like ChatGPT 3.5 Turbo or Gemini 1.5 Flash in many tasks.
I recommend using the Llama 3.1 8B variant on the Groq website, which focuses on delivering results as quickly as possible. Although there is no official data on the speed of the output, but Groq says the speed is around 450 tokens per second.
6. Running Locally Without Restrictions
As Llama is an open-source model, you can tweak or jailbreak it to generate results without censorship. More than the 405B and 70B parameter variants, I am excited about the 8B variant because it is so lightweight that I can even run it on my MacBook. However, result generation can slow down if you don’t have enough RAM and VRAM on your laptop.
You can download the AI models directly from the Meta AI website. They provide you with the AI model, which you can interact with either from the Terminal using commands or by integrating it into your application. Alternatively, you can download the Llama 3.1 models from the LM Studio app. This app allows you to download open-source AI models, including Meta’s Llama 3.1, and provides a chatbot interface to interact with it. This setup is completely local, and you can turn off the internet if you want to. By default, the model is not jailbroken and may not provide all answers without censorship. You can tweak the model if needed, but the process can be a bit technical.
Is Llama 3.1 Better Than Other Models?
Its 8B model is quite surprising with its speed, but apart from that, Llama 3.1 isn’t better than GPT-4 or Claude 3.5 Sonnet in most aspects. However, Meta AI is free and open-source, unlike other models which have a message limit per day. If you are a developer looking to incorporate AI into your app or website, Llama 3.1 is a better choice because it allows you to fine-tune the model, which isn’t an option with other models at the moment.