LLMs Explained,
Bloom

The BigScience research workshop unveiled the BigScience Large Open-science Open-access Multilingual Language Model, a.k.a. Bloom. The model is based on the GPT-3 architecture, and it has been trained on an impressive dataset that includes 46 natural languages and 13 programming languages. This state-of-the-art language model has significantly advanced natural language processing (NLP) techniques. Bloom is the first AI language model to have over 100 billion parameters for most of the languages in the dataset. Although BigScience is currently evaluating the model, early results suggest that Bloom can perform several natural language processing (NLP) tasks with zero-shot learning.

Model Card

View All Models

100+ Technical Experts

50 Custom AI projects

4.8 Minimum Rating

An Overview of Bloom

Bloom is an autoregressive language model trained on an impressive dataset that includes 46 natural languages and 13 programming languages. It was developed by collaborating with hundreds of researchers from various organizations, including Facebook AI Research, Stanford University, and New York University.

It is one of the largest open-access language models available.

176B parameters

Bloom is one of the largest language models with 176B parameters, publicly released under the Responsible AI License and freely available to the public.

The model is trained on 46 natural and 13 programming languages.

Trained on 59 Languages

Bloom model was trained on the ROOTS corpus, which is a dataset that includes hundreds of sources in 46 natural and 13 programming languages.

Compared to similar models, CO2 emission is very low (25 tons)

25 tons CO2eq Emissions

Bloom is trained on a low carbon intensity energy grid resulting in 25 tons of CO2 emissions. It is one of the greenest compared to similar models.

Introduction
Business Applications
Model Features
Model Tasks
Getting Started
Fine-tuning
Benchmarking
Sample Codes
Limitations
Other LLMs

About Model

BLOOM is an autoregressive Large Language Model. It can generate coherent and meaningful text, making it suitable for various applications. Bloom was trained on a massive multilingual corpus of text data totaling 1.6 terabytes and 350 billion tokens. One of Bloom's distinguishing features is its ability to perform zero-shot learning on various natural language processing (NLP) tasks. It can perform well on tasks for which it was not specifically trained. BLOOM was trained over almost four months using a cluster of 416 A100 80GB GPUs. The training process was open to the public, with logs available for viewing on TensorBoard. BLOOM was trained with 176 Billion parameters and a multilingual dataset. BLOOM is a decoder-only Transformer language model trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). The model achieves competitive performance on various benchmarks, with stronger results after undergoing multitask-prompted finetuning. The development was coordinated by BigScience, an open research collaboration whose goal was the public release of an LLM.

Research Paper

Model Repository

HuggingFace

Developed by

Papers with code

Checkpoints

Key highlights

Bloom is a large language model with 176B parameters, making it one of the largest language models available.
It is an open-source language model that is freely available to the public.
Trained on the ROOTS corpus, a dataset that includes hundreds of sources in 46 natural and 13 programming languages.
Achieves competitive performance on a wide variety of benchmarks, which indicates its high-quality results.
The models and code used to build Bloom are publicly released under the Responsible AI License, promoting AI technologies' ethical and responsible use.

Training Details

Training data

Bloom is trained on 46 natural languages and 13 programming languages. The dataset had 1.6TB of pre-processed text converted into 350B unique tokens.

Training dataset size

Bloom is trained on a large dataset. Its Bf16 weights 329GB, and the full checkpoint with optimizer states was 2.3TB. The dataset vocabulary size was 250,680.

Training Procedure

BLOOM's learned subword tokenizer is trained using a byte-level Byte Pair Encoding (BPE) algorithm and a simple pre-tokenization rule with no normalization.

Training time and resources

Training the model took about 4 months. Training throughput was about 150 TFLOP per GPU per second and the estimated cost of model training was $2-5M.

Model Types

There are several smaller versions of the Bloom model that have been trained on the same dataset. Here are the variations of the Bloom model based on parameter count:

Model	Parameters
bloom-560m	560 Million
bloom-1b1	1 Billion parameters
bloom-1b7	1.7 Billion parameters
bloom-3b	3 Billion
bloom-7b1	7 Billion
bloom 176B	176 Billion

Business Applications

Bloom shows the best results for tasks- Language Modeling and Question answering. You can use this model for building business applications for use cases like;

Language Modeling	Multilingual NLP
Text completion and prediction	Multilingual customer support
Sentiment analysis	Multilingual chatbots and virtual assistants
Text classification	Multilingual sentiment analysis
Language translation	Multilingual social media monitoring
Content generation and summarization	Multilingual search engines
Speech recognition and transcription	Multilingual voice assistants and speech recognition
Personalization and recommendation systems	Multilingual voice assistants and speech recognition
Information retrieval and search engines	Multilingual text summarization and classification
Fraud detection and spam filtering.	Multilingual data analysis and visualization

Model Features

The model incorporates innovative techniques that make it more effective and scalable than conventional models. Here are some of the core features of Bloom.

Transformer architecture

The model is modified from Megatron-LM GPT2. It uses a transformer-based Decoder-only architecture, which allows it to process input sequences in parallel and capture long-term dependencies. This architecture has been proven effective in various NLP tasks.

Large model size

Bloom has 176,247,271,424 parameters, 3,596,615,680 embedding parameters, 70 layers, and 112 attention heads. Hidden layers are 14336-dimensional. Sequence length is 2048 tokens.

ALiBi Positional Embeddings

Instead of adding positional information to the embedding layer, ALiBi directly attenuates the attention scores based on how far away the keys and queries are. It enables smoother training and better downstream performance.

Embedding LayerNorm

The model uses an additional layer normalization immediately after the embedding layer. We found this significantly improved training stability.

Model Tasks

Text generation

Bloom can be used to predict the next word in a sequence of text, which is a fundamental task in natural language processing and text generation. Bloom can generate coherent and grammatically correct sentences.

Question-Answering

The model can understand the question, identify the relevant information from a given text, and generate an accurate and relevant answer to the question.

Information Extraction

Natural language processing task that involves automatically extracting structured information from unstructured or semi-structured data sources, such as text documents. The model can identify and extract relevant information from a given text, such as entities, relationships between entities, and events.

Summarization

The model can generate a shorter, condensed version of a longer text while retaining the most important information. Abstractive summarization involves generating new sentences that convey the main ideas of the original text.

Fine-tuning

Here are some of the available fine-tuning techniques or methods for Bloom

Multitask Finetuning

The architecture hyperparameters in Finetuned BLOOMZ models are kept the same as those in BLOOM models. The hyperparameters for finetuning are somewhat based on T0 and FLAN. To determine learning rates, the minimum learning rate of the pretrained model is doubled and rounded. For smaller variants, global batch sizes are increased four times to improve throughput. Although the models are finetuned using 13 billion tokens, the best checkpoint is selected based on a separate validation set.

Contrastive Finetuning

The 1.3 and 7.1 billion parameter BLOOM models were contrastively fine-tuned using the SGPT Bi-Encoder recipe. This process trained models that generate high-quality text embeddings.

Multitask Finetuning

In multilingual multitask finetuning of BLOOM models using the xP3, zero-shot performance significantly increases. Multitask finetuning performance has been shown to correlate with the number of datasets

Sample Codes

Running the model on a GPU

import torch
from transformers import AutoModel, AutoTokenizer

# Load the model and tokenizer
model_name = "openai/bloom"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name).to("cuda")

# Define your input text
input_text = "This is an example input sentence."

# Tokenize the input text
input_ids = tokenizer.encode(input_text, return_tensors="pt").to("cuda")

# Run the input through the model
with torch.no_grad():
    outputs = model(input_ids)

# Get the output logits
logits = outputs.last_hidden_state

# Print the output logits
print(logits)

Model Limitations

Model may:

Overrepresent some viewpoints and underrepresent others
Contain stereotypes
Contain personal information
Generate hateful, abusive, or violent language
Generate discriminatory or prejudicial language
Generate content that may not be appropriate for all settings, including sexual content
Make errors, including producing incorrect information as if it were factual
Generate irrelevant or repetitive outputs
Induce users into attributing human traits to it, such as sentience or consciousness

Other LLMs

OPT

Meta AI first introduced OPT (Open Pre-trained Transformer) Language Model and released it in metaseq's repository on May 3rd, 2022

Galactica

Galactica is a large-scale language model developed by the research team at Meta Platforms, Inc.

LLaMA

Meta first introduced LLaMA in February 2023. LLaMA (Large Language Model Meta AI)

White Papers

Products

MENU

LLMs Explained,Bloom

100+ Technical Experts

50 Custom AI projects

4.8 Minimum Rating

An Overview of Bloom

It is one of the largest open-access language models available.

176B parameters

The model is trained on 46 natural and 13 programming languages.

Trained on 59 Languages

Compared to similar models, CO2 emission is very low (25 tons)

25 tons CO2eq Emissions

About Model

Key highlights

Training Details

Training data

Training dataset size

Training Procedure

Training time and resources

Model Types

Business Applications

Model Features

Transformer architecture

Large model size

ALiBi Positional Embeddings

Embedding LayerNorm

Model Tasks

Text generation

Question-Answering

Information Extraction

Summarization

Fine-tuning

Multitask Finetuning

Contrastive Finetuning

Multitask Finetuning

Key Benchmark Results

Sample Codes

Running the model on a GPU

Model Limitations

Other LLMs

OPT

Galactica

LLaMA

LLMs Explained,
Bloom