Skip to main content
All CollectionsScorecards
Which AI Model Should You Use?
Which AI Model Should You Use?
Updated over a week ago

Which AI Model Should You Use

Voxjar lets you select between several Large Language Models when creating or editing your scorecards.

Our base model is fine tuned to follow instructions and does a great job in a call scoring environment. It excels at simple questions.

We also support two of the new Llama 3.1 models by Meta. These models are both "GPT-4" class models and are excellent options.

  • The 70B model is a smaller model that provides a great value to intelligence. This model is going to excel at straight forward questions that require up to a moderate amount of nuance.

  • The 405B model is the full power Llama 3.1. This model is on par with GPT-4 and has the benefits of being open source. You can expect excellent performance across the board including high levels of nuance in understanding your questions and requirements.

GPT-4 is the most capable large language model on the market. It is provided by OpenAI and has consistently set the standard. GPT-4 has excellent language support and a high level of nuance in understanding your questions.

Models can be changed by clicking the scorecard header and then selecting a model from the dropdown menu.

Each model has it's strengths and weaknesses. Our goal is to give you the flexibility to find what's best for your use case and to deploy it at scale with just a few clicks.

When you combine Voxjar's prompt controls and testing loop you can choose the right LLM and fully automate you call evaluations with confidence.

If there is another model that you would like us to support, let us know

When to use GPT-4 or Lama 3.1 405B

GPT-4 is, by all measures, the most capable large language model as of this writing.

Because of that, you will likely get answers and reasoning based on a better understanding of your call data.

This will be especially noticeable in a handful of scenarios:

  • If your questions/answers require the AI to comprehend multiple topics

  • If your calls follow a less structured flow

  • If the AI will be scoring longer calls (30 minutes+)

  • Your calls are non-English language

It becomes pretty clear whether you need a GPT-4 level model when you run a few tests comparing different models.

GPT-4 and Lama 3.1 405B Caveats

There are a couple of tradeoffs for the impressiveness of GPT-4 level models.

These are premium language models and uses 2x the credits vs other models.

They are also up to 10x slower than other models. This is usually not an issue, but delays on testing and manually queued AI evaluations will be up to a minute slower.


โ€‹
โ€‹

Did this answer your question?