Llama 2

Llama 2

Llama 2 is a collection of pretrained and fine-tuned large language models (LLMs) that range in scale from 7 billion to 70 billion parameters. The fine-tuned LLMs, known as Llama 2-Chat, are specifically optimized for dialogue applications. These models surpass the performance of most open-source chat models on the benchmarks they were tested on.

Model Details View All Models

100+ Technical Experts

50 Custom AI projects

4.8 Minimum Rating

An Overview of Llama 2

The architecture of LLAMA 2 is very similar to the first Llama, with the addition of Grouped Query Attention.

LLAMA 2 have double the context length than Llama 1

Trained on 2 trillion tokens

Llama 2's pretraining utilized an extensive 2 trillion-token dataset from publicly available content. It's fine-tuning involved publicly accessible instructional datasets and over a million new human-annotated examples.

Grouped Query Attention (GQA) to improve inference scalability.

Grouped Query Attention

LLAMA 2 used Grouped Query Attention to improve inference scalability. GQA is a standard practice for autoregressive decoding to cache the key and value pairs for the previous tokens in the sequence, speeding up attention computation.

The model uses a new method, GAtt for multi-turn consistency

Ghost Attention (GAtt)

The model uses Ghost attention for multi-turn consistency after RLHF (Context Distillation to remember previous/initial instructions). It is a novel method for forcing LLMs to follow instructions

Blockchain Success Starts here

  • Model Details

  • Key Highlights

  • Training Details

  • Benchmark Results

  • Sample Codes

  • Fine-tuning

  • Limitations

ModelSizeCodeCommonsense ReasoningWorld KnowledgeReading ComprehensionMathMMLUBBHAGI Eval
Llama 17B14.160.846.258.56.9535.130.323.9
Llama 113B18.966.152.662.310.946.937.033.9
Llama 133B26.070.058.467.621.457.839.841.7
Llama 165B30.770.760.568.630.863.443.547.6
Llama 27B16.863.948.961.314.645.332.629.3
Llama 213B24.566.955.465.828.754.839.439.1
Llama 270B37.571.963.669.435.268.951.254.2

Llama 2 7B and 30B models outperform MPT models of the corresponding size in all categories besides code benchmarks. For the Falcon models, Llama 2 7B and 34B outperform Falcon 7B and 40B models on all categories of benchmarks. Additionally, Llama 2 70B model outperforms all open-source models.


When compared with closed-source LLMs, Llama 2 70B is close to GPT-3.5 on MMLU and GSM8K, but there is a significant gap in coding benchmarks. Llama 2 70B results are on par or better than PaLM (540B) on almost all benchmarks. There is still a large gap in performance between Llama 2 70B and GPT-4 and PaLM-2-L.

LLAMA2-benchmark-againt-closed (1)