Salesforce Unveils LLM Benchmark to Optimise GenAI Implementations

Salesforce Unveils LLM Benchmark to Optimise GenAI Implementations

The LLM Benchmark for CRM allows Salesforce customers to match each GenAI use case to the best-suited model.

Salesforce has announced the “world’s first” LLM (large language model) Benchmark for CRM. With the solution, Salesforce customers can rank LLM models on a leaderboard to select the best possible option for their chosen genAI use case.

The LLM Benchmark enables this by leveraging a “comprehensive evaluation framework”, with criteria grouped into four categories: accuracy, speed, cost, and trust & safety.

With scores available across each category, Salesforce hopes businesses can make “smart decisions” when evaluating LLMs and utilise its open Einstein platform to leverage multiple models where they’re best fit.

So, for example, a service team may harness one LLM to draft customer replies and another to summarise customer conversations. However, the LLM Benchmark will only be available across service and sales use cases on release – with Salesforce planning to expand the offering across its CRM apps thereafter.

The CRM provider also promises to continually advance its evaluation of LLMs and – eventually – to rank fine-tuned models alongside iterations of hallmark offerings like ChatGPT, Gemini, and Llama.

Sharing the news, Silvio Savarese, EVP & Chief Scientist at Salesforce AI Research, stated, “Salesforce’s new LLM Benchmark for CRM is a significant step forward in the way businesses assess their AI strategy within the industry. It not only provides clarity on next-generation AI deployment but also can accelerate time to value for CRM-specific use cases. Our commitment is to continuously evolve this benchmark to keep pace with technological advancements, ensuring it remains relevant and valuable.”

With the LLM Benchmark, businesses can not only monitor the best-performing LLM but gain new insight into the performance of such deployments. 

According to Salesforce, its offering is differentiative because it’s based on real-world data sets and human evaluations conducted by its employees and external customers. All that combined data filters into a Tableau Dashboard, where businesses can assess LLMs across use cases.