Flan-UL2

InstructEval Models Explained,
Flan-UL2

Flan-UL2 is an encoder-decoder Transformer model pre-trained on a massive dataset of text and code. Flan-UL2 has many notable improvements over the original UL2 model. First, it has a larger receptive field of 2048, which makes it more suitable for few-shot in-context learning. Second, it does not require mode switch tokens, which makes it easier to use. Finally, it has been fine-tuned using the Flan prompt tuning and dataset collection, further improving its performance. Flan-UL2 excels across various tasks, encompassing question answering, text summarization, and natural language inference.

Model Details View All Models

100+ Technical Experts

50 Custom AI projects

4.8 Minimum Rating

An Overview of Flan-UL2

Flan-UL2, an advanced large language model (LLM) created by Google AI. It has demonstrated exceptional performance on a range of NLP benchmarks in June 2023. These comprehensive evaluations encompass tasks such as question-answering, summarization, and natural language inference. Flan-UL2 achieved state-of-the-art results across prominent benchmarks, including GLUE, SQuAD, and RACE.

It was trained on a dataset of text and code containing 1.56TB of text and 600GB of code.

100 billion parameters

Flan-UL2 has a base parameter count of 20 billion and an upper parameter count of 100 billion. This is more than the number of parameters in GPT-3, the previous state-of-the-art LLM.

Flan-UL2 uses 20% less memory than GPT-3 because it is built on a more efficient architecture.

Efficient and scalable

It is based on the Flan architecture, a new LLM architecture built to be more efficient and scalable. This architecture allows Flan-UL2 to run faster using less memory than other LLMs of the same size.

Flan-UL2 can be used to train other LLMs, which can further improve their performance.

Community-driven model

The open-source nature of Flan-UL2 makes it easy to extend the model with new features or capabilities. This allows users to customize the model to meet their specific needs.

Blockchain Success Starts here

  • Introduction

  • Model Highlights

  • Training Details

  • Model Types

  • Fine-tuning Methods

  • Using the Model

  • Results

  • Other InstructEval Models

Model Parameters Architecture InitializationTask
Flan-UL2-Base20 billionT5FlanNatural language inference, question answering, summarization, translation, code generation, etc.
Flan-UL2-Medium40 billionT5FlanNatural language inference, question answering, summarization, translation, code generation, etc.
Flan-UL2-Large100 billionT5FlanNatural language inference, question answering, summarization, translation, code generation, etc.
MMLUBBHMMLU-CoTBBH-CoTAvg
FLAN-PaLM 62B59.647.556.944.949.9
FLAN-PaLM 540B73.557.970.966.367.2
FLAN-T5-XXL 11B55.145.348.641.447.6
FLAN-UL2 20B55.7(+1.1%)45.9(+1.3%)52.2(+7.4%)42.7(+3.1%)49.1(+3.2%)