Choosing the right LLM for your RAG project

Leo Püttmann

5/22/2024

Choosing the right LLM for your RAG project

Barely a week goes by without a new, amazing model release. Claude 3, Llama 3, Phi-3, Command R+, Mixtral 8x22B, or GPT-4o are just a few of the absolute most recent model releases that have caught the attention of many. Rightly so, as we are entering an era where openly available models are getting dangerously close to, or in some cases even surpassing, the LLM throne keeper, GPT-4. OpenAI's flagship model, which was released more than a year ago, may not be the champion for much longer. And for projects in the domain of Retrieval Augmented Generation, we dare to ask: Does the choice of the LLM that much matter at all? Let’s find out.

Can’t wait to to revolutionize your RAG project with Cognition? Book a demo now!

Integrating models in our LLM agnostic RAG platform Cognition

We consider our platform to be LLM-agnostic, as you can easily integrate LLMs from multiple vendors, either with the help of your pre-built nodes or through custom code snippets.

By default, projects are set up with an OpenAI LLM, but switching to a service that is securely hosted by Azure is quick and easy. We will also be adding more pre-built LLM nodes to allow you to switch models even faster. And, of course, you can also switch to a custom implementation with our Python nodes.

With that flexibility in mind, let's find out which model would be best suited for RAG tasks!

Evaluating models via the LMSys Arena

There are many benchmarks that can be used to evaluate the performance of LLMs, such as their ability to write code, do math, or reason with logic. One flaw of these benchmarks is that the data of the benchmarks can simply be included in the training data for the LLM. This makes it unclear if models are performing well because they are actually good, or because they simply memorized the results for the tests.

Luckily, there is a place where LLMs cannot be skewed so easily: The LLM Arena! In the arena, users can simply prompt an LLM and randomly get responses from two models, of which they don’t know which models they are using. Afterwards, they can then decide which model’s answer they like better. This means the model’s performances are evaluated on actual users and real questions and tasks they have. However, this also means that the results are of course subjective, as things like the model's apparent friendliness or style can influence the user's voting decision. Nevertheless, we think that the wisdom of the crowd definitely has a lot of value.

You can try the Arena yourself here: Chat with Open Large Language Models (lmsys.org)

Note that the arena is evaluating the overall quality of the model, not just its RAG capabilities. That being said, if a model is perceived as being very good by a lot of people, chances are that it will also be well usable for a RAG use case.

The real bottelneck for RAG projects is (usually) not the LLM

Even though the LLM is really important, we have found that the model you choose is rarely the bottleneck. Retrieval Augmented Generation is a complex topic, and if you don’t get the 'retrieval' part right, the 'Augmented Generation' will fail, no matter what model you use. Even a GPT-7 would probably not get this right, because the information to answer a question is just not there.

A typical RAG pipeline consists of many pieces, with the LLM being the last and final one. Before that, however, there’s a lot of things going on. Often we use smaller and efficient models like GPT-3.5 to rephrase search questions or to filter out important keywords to get much more accurate search results. And vector DBs are fantastic for finding similar texts for a search result, but not for getting relevant results, which is why we always use small reranking models to filter out our search results that are not helpful for answering a question. We also offer custom-built reranking models for specific industries like the insurance sector for increased accuracy!

Another thing to consider is of course the embeddings you use. There are embedding models that work with many languages and domains out of the box, but some special project might require a more specialized or finetuned model for the embedding part.

You can see that the LLM is really the cherry on top of the system that we can build in Kern AI Cognition. But, of course, we want to ensure that we get the shiniest and freshest cherry, so let's compare some LLM providers to see which ones are the best.

Comparing LLM Model providers

Let’s take a closer look at some specific companies and see the pros and cons of using them! One interesting observation is that open-source models are slowly catching up to the capabilities of proprietary models. That’s fantastic, as more choice and competition is always good for the user!

OpenAI - The obvious gold standard

Even two years after the initial release of ChatGPT, OpenAI still dominates the space of LLMs. With their most recent release of the multimodal model GPT-4o, they demonstrated a commitment to pushing the boundaries of what LLMs can achieve.

The introduction of GPT-4 has set a new standard in the field. Its advanced capabilities in understanding and generating human-like text make it a solid choice for complex tasks within RAG pipelines. GPT-4's ability to handle intricate contexts, reason through problems, and generate high-quality responses makes it a go-to model for many developers and researchers.

However, it's not just about the latest and greatest. The smaller, more efficient GPT-3.5 continues to play a crucial role in many of our RAG pipelines. GPT-3.5 has become incredibly fast and cost-effective, making it perfect for tasks like translation, data transformation, or text classification. The model can handle a significant volume of data quickly and cost-effectively, making it an essential component in many RAG systems.

For European users, a big additional benefit of the OpenAI models is that they are available via Microsoft Azure, where you are guaranteed that the model and the data stay within the EU and are not used for training purposes.

Mistral AI - The new kid on the block

Mistral AI, a French start-up founded in mid-2023, has quickly become a notable player in the Large Language Model (LLM) landscape. The company's substantial fundraising, including a reported $400 million round in December 2023 and rumors of an upcoming $600 million round, reflects market confidence in its capabilities.

Mistral AI's model lineup showcases its innovative approach. The open-source Mistral 7b, despite its smaller size, outperformed the larger Llama 1, demonstrating impressive capabilities. While the 7b model may not be suitable for most Retrieval Augmented Generation (RAG) projects, Mistral AI's subsequent releases, such as the open-source mixture of experts model 8x7b and the recent 8x22b model, offer memory-efficient solutions well-suited for RAG tasks. The open models also offer an excellent performance-to-cost ratio, as you can see in the chart below, making them excellent choices for self-hosting.

Mistral AI's proprietary models, like Mistral-large, also perform comparably to leading models such as GPT-4 in our experience, with GPT-4 being just slightly better in some cases. Mistral-large is also available via Microsoft Azure.

Anthropic - The Magnus Opus of LLMs?

Founded by ex-OpenAI members and backed by companies like Amazon, Anthropic is surely one of the bigger names in the world of LLMs. Their models Claude 1 and 2 significantly underperformed the OpenAI GPT models, but their big hit came with the release of Claude 3, which finally elevated the company into the ring of the most relevant LLM contenders.

Claude 3 comes in different versions with different sizes: Opus, Sonnet, and Haiku, with Opus being the largest and most capable model, surpassing even GPT-4 in many benchmarks and taking the throne in the LLM Arena for a while.

That place has been replaced by newer versions of GPT-4 already, but Claude 3 remains a super capable model that should be well-suited for RAG from a technical standpoint. We are definitely more than excited to see what Claude 4 will bring!

Cohere - Big open-source models!

This company has been in the language model space for a while and was co-founded by one of the authors of the famous paper 'Attention is All You Need' which presented the first transformer model.

Recently, the company released Command R+, an open-source model with enourmous 104b parameters, making it one of the larger open-source models. While the model offers stunning capabilities, the performance-to-cost ratio is significantly worse than the open alternatives of Meta AI or Mistral AI.

Still, the model would probably be a solid choice for any RAG project, although we are quite confident that there are better models like the Llama 3 or Mistral AI model when it comes to self-hosting your LLM.

HuggingFace (open-source) 🤗

If you like tinkering around, there are hundreds of open-source models available, many of which are fine-tuned or quantized versions of Meta's Llama models. The amazing thing here is that these models are freely available, and there are many fine-tunes for specific tasks or domains.

With the release of Llama 3, the open-source LLM community got another excellent addition, and especially the 70B can be seen as serious competition for OpenAI's GPT models. The only downside of the Llama models is the weird special license of Meta, making the model a bit harder to recommend for commercial purposes.

HuggingFace also offers an inference API, which you can also implement into your projects.

Book your demo now

Are you ready to take your RAG project to the next level with Cognition? Our platform is LLM-agnostic, so you can easily integrate models from multiple vendors, including the latest and greatest from OpenAI, Mistral AI, and Anthropic. With our flexible and customizable pipeline, you can ensure that every component, from retrieval to generation, is optimized for your specific use case.

Book a personalized demo with us today and see firsthand how Cognition can help you achieve your RAG goals. And as a special bonus, we'll process your first document for free! Don't miss out on this opportunity to elevate your RAG project with Cognition.

Go to newsletter