Large Language Models (LLMs) have revolutionized natural language processing and have shown impressive results in various language tasks.
Problem:Several LLMs are available in the market. But, relevant information about these models is scattered on the internet, and it is extremely difficult to evaluate these models.
Solution: We created this leaderboard to help researchers easily identify the best open-source LLM with an intuitive leadership quadrant graph. We evaluate the performance of open-source LLMs to rank them based on their capabilities and market adoption.


As of March 20, the top three leaders in the open-source LLMs are GLM, Galactica, and T5. Based on our scoring methodology, these models scored 75, 62, and 60 points, respectively. The scoring methodology is explained below. The current leader is GLM, a General Language Model pretrained with an autoregressive blank-filling objective and can be finetuned on various natural language understanding and generation tasks. The model is trained on a diverse and extensive corpus of text data. GLM-130B, with 130 billion parameters, has demonstrated cutting-edge performance in various language tasks, including question-answering, sentiment analysis, and machine translation. On a wide range of tasks across Natural Language Understanding, conditional and unconditional generation, GLM outperforms BERT, T5, and GPT on common testing conditions.


RankModelSizeArchitectureOrganizationAdoption Rating
Calculated based on the number of forks and stars on the official model repo.
Capability Rating
Calculated based on the number of tasks and downstream tasks of the model.
A weighted average of the adoption and capability score of the model.
#1LLaMA65BTransformer, AutoregressiveMeta AI858987
#3Galactica120BTransformerMeta AI475853
#5GLM130BTransformer, AutoregressiveTsinghua University485049
#6OPT175BTransformerMeta AI494145
#9GPT-NeoX20BTransformer, AutoregressiveEleutherAI423338
#12Pythia12BDecoder-only autoregressiveEleuther.ai212121

Ranking Methodology

We only considered prominent and open-source LLMs to create this leaderboard. Note that this leaderboard can only be considered a high-level indicator of overall performance. Depending on the specific use case and business requirements, a detailed analysis is required to choose the right model. The key parameters we used for the scoring are;

  1. Benchmark results
  2. Model forks

Capability Rating(CR) is calculated based on a weighted sum of benchmark results(BR) published in the Model's research paper.

Rank weights;
For performance ranks #1 to #5, rank weight = 3.

For performance ranks #6 to #10, rank weight = 2.

For performance ranks #11 to #20, rank weight = 1.

Adoption Rating (AR) is calculated based on Model forks (MF) and penalizing that value against model performance. To calculate the adoption rating, we calculate the sum of the normalized value of Forks and Capability score.  Then normalize the resulting value to 100. The Model score is simply the average of scores Adoption Rating and Capability Rating. 

Generative AI Adoption Framework

This whitepaper will explore generative AI and identify business growth opportunities it offers. We aim to provide business owners with a comprehensive guide to using AI to unlock new opportunities and achieve sustainable growth. We will explore how generative AI can be used to analyze data and identify patterns, as well as how it can be used to generate new ideas and solutions.

Free Download

