Business friendly LLM Leaderboard | Best LLMs for commercial use

Q: Why is LLaMA 2 ranked low on the leaderboard?

LLaMA2 70B is the most capable model in the list based on its performance benchmarks. However, it is ranked lower in the leaderboard because of the higher cost of fine-tuning and the conditional restrictions in its license, which may not be favorable for large-scale business adoption.

Q: Why is T5 ranked first despite its MMLU rank being 3rd on the list?

The above ranks indicate only a relative comparison. For example, If you want to use an open-source LLM for a commercial project without any usage restrictions and need to fine-tune the model for specific needs, then you can choose a model based on the above model ranks as an indicator. Note that the models in the first 2 positions offer very good capability scores and high scores for ease of adoption. A higher value of ease of adoption means it would cost you less money and computing power to fine-tune it. Moreover, these models are under Apache 2.0 license, enabling unrestricted commercial usage. However, If you do not want additional fine-tuning and do not expect your application to have more than 700 million users, then LLaMA2 models would be the best choice as they offer far better capability scores.

Q: Why do certain models like Bloom, DLite, and GPT-J are not included in the list?

Based on our observations, models with fewer than 10B parameters fall short in delivering the performance and reliability required for scalable business applications. Their efficacy is questionable, leading us to exclude models under this size threshold. Furthermore, while larger models such as Bloom have gained attention, they don't measure up in competence to LLaMA2. The financial burden of fine-tuning these substantial models can be significant, potentially discouraging businesses from integrating them into commercial applications.

Q: How can Accubits help me in adopting LLMs in my organisation?

At Accubits, we stand proudly as a front-runner in providing generative AI consulting and bespoke AI development services. We've rolled out several LLMs, making them open-source with commercial usage rights. Our consulting program assists businesses in unveiling the vast potential of AI. Moreover, our development services are dedicated to crafting products and solutions using our tried-and-true GAI adoption framework and model zoos. If AI integration into your business context seems confusing or if you're unsure where to start, don't hesitate to reach out. We're here to help.

Business Friendly
LLMs
Leaderboard

LLMs are game-changers for businesses. They're not just tools, but powerful catalysts capable of supercharging operational efficiency and accelerating growth.
The Problem: There are hundreds of LLMs available in the market, but most don't allow commercial use. Only a few are business-friendly, and finding them among the vast options can be time-consuming and challenging.
Our Solution: We created this business-friendly LLM leaderboard to help entrepreneurs easily identify the right open-source LLM with commercial usage rights.View Models

Leaders

As of 8th August, the top 3 leaders in our business-friendly LLM leaderboard are T5, GenZ and UL2. Based on our scoring methodology, these models scored, 72, 70 and 60 respectively. The scoring methodology is explained below. The current leader is T5. With an MMLU score of 55.1 with 11B parameters and Apache 2.0 license, the model stands as the best fit for businesses that want to use an open-source LLM for a commercial project without any usage restrictions and need to fine-tune the model for specific needs. GenZ is ranked in second place, the model achieved SOTA in its category for the MT Bench benchmark with 87% accuracy compared to ChatGPT and on par with the LLaMA2 70B chat model, which is a 5X bigger model that requires 40X more GPU memory.

Leaderboard

Rank	Model	License	Capability Score Calculated based on MMLU benchmark and general model performance	Usability Score Calculated based on MMLU benchmark and general model performance	Ease of Adoption Calculated based on adoption cost, computing power requirements	Ag. Score Aggregate score of the model
#1	T5 11B	Apache 2.0	55.1	8.9	89	72.05
#2	GenZ 13B	Apache 2.0	53.68	8.7	87	70.34
#3	UL2 20B	Apache 2.0	39.2	8	80	59.6
#4	Pythia 12B	Apache 2.0	26.76	8.8	88	57.38
#5	Open Assistant 12B	Apache 2.0	26.55	8.8	88	57.275
#6	Cerebras-GPT 13B	Apache 2.0	25.92	8.7	87	56.46
#7	LLaMA 2 13B	Custom	54.8	8.7	60.9	57.85
#8	GPT NeoX 20B	Apache 2.0	29.92	8	80	54.96
#9	LLaMA 2 34B	Custom	62.6	6.6	46.2	54.4
10	Dolly 12B	MIT	25.92	8.8	61.6	43.76
#11	LLaMA 2 70B	Custom	68.9	3	21	44.95
#12	MPT-30B	CC BY-SA-3.0	47.93	7	28	37.965

Need help with Generative AI?

If you have any questions or need a helping hand, don't hesitate to reach out.

Let’s Get Started

The first step towards greatness begins now, let's embark on this journey.

Help us Help you.

Share more information with us, and we'll send relevant information that cater to your unique needs.

Final Touch

Kindly share some details about your company to help us identify the best-suited person to contact you.

Contact Details

First Name*

Last Name

Email*

Country Code

Phone Number

Ranking Methodology

The models are ranked based on an evaluation that considers the model's capabilities, ease of adoption which accounts for the model adoption cost, and the usability of the model, which accounts for the availability of commercial usage rights.

The capability score is calculated using the MMLU (Massive Multitask Language Understanding) benchmark scores. A higher value of MMLU indicates that the model is proficient in understanding language across a wide range of tasks. This proficiency suggests that the model has a robust capability that can benefit business applications. MMLU score is normalized to 10 to calculate the capability score.

Ease of adoption is calculated based on the model size. The bigger the model size, the more computing power it needs for fine-tuning, which means higher costs in adopting the model. So bigger models have low scores for ease of adoption. For the evaluation, we only considered models bigger than 10B parameters. The score is calculated by plotting the parameter value inversely proportional to a scale of 0 to 100 and normalizing to the scale of 0 to 10 for simplicity in calculations.

The usability score is calculated based on the degree of restrictions present in the model’s license. Apache 2.0 has the highest score of 10, offering the most flexibility and benefits to business and commercial use cases. Followed by MIT with a score of 7 and CC BY-SA-3.0 with a score of 4. Since LLaMa offers a custom license that has usage restrictions, its usability score is set as 7. The scores for each license are defined based on the relative freedom each offers businesses to launch commercial applications.

The aggregate score of the model is calculated by averaging the capability score, ease of adoption, and usability score. The aggregate score is a direct indicator of the rank of the model.

Why is LLaMA 2 ranked low on the leaderboard?

Why is T5 ranked first despite its MMLU rank being 3rd on the list?

Why do certain models like Bloom, DLite, and GPT-J are not included in the list?

How can Accubits help me in adopting LLMs in my organisation?

White Papers

Products

MENU

Business-friendly LLMs Leaderboard

Business Friendly
LLMs
Leaderboard

Leaders