As of 8th August, the top 3 leaders in our business-friendly LLM leaderboard are T5, GenZ and UL2. Based on our scoring methodology, these models scored, 72, 70 and 60 respectively. The scoring methodology is explained below. The current leader is T5. With an MMLU score of 55.1 with 11B parameters and Apache 2.0 license, the model stands as the best fit for businesses that want to use an open-source LLM for a commercial project without any usage restrictions and need to fine-tune the model for specific needs. GenZ is ranked in second place, the model achieved SOTA in its category for the MT Bench benchmark with 87% accuracy compared to ChatGPT and on par with the LLaMA2 70B chat model, which is a 5X bigger model that requires 40X more GPU memory.
Calculated based on MMLU benchmark and general model performance
Calculated based on MMLU benchmark and general model performance
|Ease of Adoption|
Calculated based on adoption cost, computing power requirements
Aggregate score of the model
|#1||T5 11B||Apache 2.0||55.1||8.9||89||72.05|
|#2||GenZ 13B||Apache 2.0||53.68||8.7||87||70.34|
|#3||UL2 20B||Apache 2.0||39.2||8||80||59.6|
|#4||Pythia 12B||Apache 2.0||26.76||8.8||88||57.38|
|#5||Open Assistant 12B||Apache 2.0||26.55||8.8||88||57.275|
|#6||Cerebras-GPT 13B||Apache 2.0||25.92||8.7||87||56.46|
|#7||LLaMA 2 13B||Custom||54.8||8.7||60.9||57.85|
|#8||GPT NeoX 20B||Apache 2.0||29.92||8||80||54.96|
|#9||LLaMA 2 34B||Custom||62.6||6.6||46.2||54.4|
|#11||LLaMA 2 70B||Custom||68.9||3||21||44.95|
Need help with Generative AI?
If you have any questions or need a helping hand, don't hesitate to reach out.
Let’s Get Started
The first step towards greatness begins now, let's embark on this journey.
Help us Help you.
Share more information with us, and we'll send relevant information that cater to your unique needs.
Kindly share some details about your company to help us identify the best-suited person to contact you.
The models are ranked based on an evaluation that considers the model's capabilities, ease of adoption which accounts for the model adoption cost, and the usability of the model, which accounts for the availability of commercial usage rights.
The capability score is calculated using the MMLU (Massive Multitask Language Understanding) benchmark scores. A higher value of MMLU indicates that the model is proficient in understanding language across a wide range of tasks. This proficiency suggests that the model has a robust capability that can benefit business applications. MMLU score is normalized to 10 to calculate the capability score.
Ease of adoption is calculated based on the model size. The bigger the model size, the more computing power it needs for fine-tuning, which means higher costs in adopting the model. So bigger models have low scores for ease of adoption. For the evaluation, we only considered models bigger than 10B parameters. The score is calculated by plotting the parameter value inversely proportional to a scale of 0 to 100 and normalizing to the scale of 0 to 10 for simplicity in calculations.
The usability score is calculated based on the degree of restrictions present in the model’s license. Apache 2.0 has the highest score of 10, offering the most flexibility and benefits to business and commercial use cases. Followed by MIT with a score of 7 and CC BY-SA-3.0 with a score of 4. Since LLaMa offers a custom license that has usage restrictions, its usability score is set as 7. The scores for each license are defined based on the relative freedom each offers businesses to launch commercial applications.
The aggregate score of the model is calculated by averaging the capability score, ease of adoption, and usability score. The aggregate score is a direct indicator of the rank of the model.
Generative AI Adoption Framework
This whitepaper will explore generative AI and identify business growth opportunities it offers. We aim to provide business owners with a comprehensive guide to using AI to unlock new opportunities and achieve sustainable growth. We will explore how generative AI can be used to analyze data and identify patterns, as well as how it can be used to generate new ideas and solutions.Free Download
Frequently Asked Questions
Here are some of the most common questions we get asked. If you have a question that isn't on this list, please don't hesitate to contact us. We're always happy to help! We'll get back to you within 24 hours.
LLaMA2 70B is the most capable model in the list based on its performance benchmarks. However, it is ranked lower in the leaderboard because of the higher cost of fine-tuning and the conditional restrictions in its license, which may not be favorable for large-scale business adoption.
The above ranks indicate only a relative comparison. For example, If you want to use an open-source LLM for a commercial project without any usage restrictions and need to fine-tune the model for specific needs, then you can choose a model based on the above model ranks as an indicator. Note that the models in the first 2 positions offer very good capability scores and high scores for ease of adoption. A higher value of ease of adoption means it would cost you less money and computing power to fine-tune it. Moreover, these models are under Apache 2.0 license, enabling unrestricted commercial usage. However, If you do not want additional fine-tuning and do not expect your application to have more than 700 million users, then LLaMA2 models would be the best choice as they offer far better capability scores.
Based on our observations, models with fewer than 10B parameters fall short in delivering the performance and reliability required for scalable business applications. Their efficacy is questionable, leading us to exclude models under this size threshold. Furthermore, while larger models such as Bloom have gained attention, they don't measure up in competence to LLaMA2. The financial burden of fine-tuning these substantial models can be significant, potentially discouraging businesses from integrating them into commercial applications.
At Accubits, we stand proudly as a front-runner in providing generative AI consulting and bespoke AI development services. We've rolled out several LLMs, making them open-source with commercial usage rights. Our consulting program assists businesses in unveiling the vast potential of AI. Moreover, our development services are dedicated to crafting products and solutions using our tried-and-true GAI adoption framework and model zoos. If AI integration into your business context seems confusing or if you're unsure where to start, don't hesitate to reach out. We're here to help.