InstructEval Models Explained,
Guanaco

The Guanaco models are open-source chatbots that have been finely tuned using a 4-bit QLoRA approach on LLaMA base models, leveraging the OASST1 dataset. These models are available in 7B, 13B, 33B, and 65B parameter sizes. The development of the Guanaco model dataset was specifically focused on augmenting multilingual capabilities and addressing a wide range of linguistic tasks. In addition to encompassing language-specific tasks, the Guanaco dataset encompasses novel challenges aimed at enhancing the model's proficiency in English grammar analysis, natural language understanding, cross-lingual self-awareness, and explicit content recognition.

Model Details

View All Models

100+ Technical Experts

50 Custom AI projects

4.8 Minimum Rating

An Overview of Guanaco

Tim Dettmers et al. initially developed the Guanaco model within the UW NLP group. However, the model was open-sourced in a subsequent phase, and Josephus Cheung significantly contributed to its development. Consequently, the credits for the model are now attributed to both Tim Dettmers and Josephus Cheung. The primary objective behind creating the Guanaco model was to engineer a highly versatile multilingual language model capable of excelling in diverse tasks, such as instruction-following, question-answering, text generation, language translation, and creative task completion. By achieving exceptional performance on the Vicuna benchmark, the model outperforms all previously released openly available models. Notably, the Guanaco model attains an impressive 99.3% performance level compared to ChatGPT while requiring an incredibly efficient fine-tuning process of merely 24 hours on a single GPU.

It has exhibited competitive performance with commercial chatbot systems on the Vicuna and OpenAssistant benchmarks.

Multilingualism

The model is multilingual, meaning it can understand and generate text in multiple languages. This is a significant advantage, as it allows the model to be used in various applications.

The model achieved 99% accuracy on the Vicuna chatbot benchmark, so it could answer 99% of the questions correctly.

Large Size

The model builds upon the LLaMA base model, utilizing different parameter sizes to cater to various requirements. Specifically, the Guanaco model is available in sizes 7B, 13B, 33B, and 65B.

The model can be fine-tuned on a GPU with less than 48GB of memory, making it a cost-effective solution for firms.

Open-source

It is made publicly available as an open-source resource, providing a wider range of users with unfettered access to its capabilities. As such, it enables individuals from diverse backgrounds to harness its potential.

Model Details

The Guanaco models incorporate LoRA adapters, which are applied to all layers of the underlying LLaMA models. These LoRA adapters play a pivotal role in enhancing the model's capabilities. Across all Guanaco model sizes, a consistent value of $r=64$ is used, ensuring uniformity and coherence in the application of the adapters. The Guanaco models are trained on an extensive dataset comprising both text and code, surpassing a size of 100TB. Notably, the model offers a more cost-effective alternative, as the training cost of the model on a standard GPU amounts to $10,000 while training a GPT-4 model on a TPU v4 costs $100,000. Additionally, the model exhibits enhanced scalability, effortlessly accommodating larger datasets and more complex tasks. In terms of performance, Guanaco and GPT-4 demonstrate similar proficiency across various tasks. However, Guanaco surpasses GPT-4 in terms of accuracy in certain tasks, such as question answering, while GPT-4 displays greater creativity in areas such as text generation. These nuanced strengths contribute to the versatility and adaptability of Guanaco models, offering distinct advantages based on the specific objectives and requirements of the task at hand.

Hugging Face

Model Repositary

Research Paper

Online Model Demo

Model Highlights

Guanaco LLM is a fine-tunable language model with extensive applications across various business domains. Its rich features and capabilities include natural language generation, question answering, code generation, and translation. These powerful functionalities empower businesses to enhance customer service, generate compelling marketing content, automate tasks, and develop innovative applications. Leveraging Guanaco LLM enables the creation of cutting-edge solutions like virtual assistants or language translation tools, enabling businesses to offer new and exceptional services to their customers.

The QLoRA 4-bit method implemented in the Guanaco model significantly reduces the model size by an impressive 80%. The model size decreases from 192GB to a much more compact and efficient size of 48GB.
The model showcases remarkable capabilities in language translation, offering accurate translations across a wide range of 100 languages with an impressive accuracy rate of over 90%.

Training Details

Training Dataset

The dataset powering the Guanaco LLM model enriches the Alpaca model with 175 tasks, including rewrites in multiple languages and new tasks for English grammar analysis, natural language understanding, cross-lingual self-awareness, and explicit content recognition. With meticulous attention, the dataset comprises 534,530 entries, providing a comprehensive sample.

Training Procedure

The Guanaco models are derived from 4-bit QLoRA supervised finetuning on the OASST1 dataset. These models employ NormalFloat4 as the base model, with LoRA adapters applied to all linear layers and BFloat16 as the computation datatype. The LoRA parameters are set to $r=64$ and $\alpha=16$. Furthermore, Adam beta2 of 0.999 and a maximum gradient norm of 0.3 are utilized, along with LoRA dropout rates of 0.1 for models up to 13B and 0.05 for 33B and 65B models. The finetuning process employs a constant learning rate schedule and the paged AdamW optimizer.

Training Observation1

The Guanaco model, enhanced through the QLoRA 4-bit finetuning method, has evolved into a potent and versatile language model suitable for various tasks. This approach has enabled the training of large language models on a single GPU and substantially reduced training costs.

Training Observation2

The training procedure demonstrated the model's effectiveness across diverse tasks. As such, the model is considered a valuable tool for researchers and developers in natural language processing.

Training Hyperparameters

Below given are the training hyperparameters of the different model types:

Model Type	Dataset	Batch Size	Learning Rate	Max Steps	Sequence Length
Guanaco-7B	OASST1	16	2e-4	1875	512
Guanaco-13B	OASST1	16	2e-4	1875	512
Guanaco-33B	OASST1	16	1e-4	1875	512
Guanaco-65B	OASST1	16	1e-4	1875	512

Risks and Bias

While the model aims to provide informative and helpful outputs, it's important to note that it may occasionally generate output that could be factually inaccurate. Additionally, as the model is trained on a wide range of public datasets, occasional offensive, biased, or explicit content is possible. However, continuous efforts have been made to mitigate biases through the fine-tuning process on the OASST1 dataset. Moreover, this model has demonstrated positive results in reducing these concerns, as evaluated on the CrowS dataset. It is advised to exercise caution and conduct proper validation when relying on the model's responses.

Using the Model

To load Guanaco 7B in 4-bits, you can follow these steps:


import torch
from peft import PeftModel    
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

model_name = "huggyllama/llama-7b"
adapters_name = 'timdettmers/guanaco-7b'

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    load_in_4bit=True,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    max_memory= {i: '24000MB' for i in range(torch.cuda.device_count())},
    quantization_config=BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_compute_dtype=torch.bfloat16,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type='nf4'
    ),
)
model = PeftModel.from_pretrained(model, adapters_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

Once the model has been successfully loaded and quantized, performing inference follows standard procedures commonly used with Hugging Face (HF) models.:

Expected output obtained similar to the below one:

It is important to note that 4-bit inference may exhibit slower performance. If speed is critical for your application, we recommend loading the model in 16 bits instead.

Other InstructEval Models

Falcon 7B Instruct

Falcon-7B-Instruct is a 7B parameter causal decoder-only model built by TII based on Falcon-7B and finetuned on a mixture of chat/instruct datasets.

Alpaca LoRA

Alpaca LoRA is a 65B parameter LLM that has undergone quantization to 4 bits, resulting in a smaller and more efficient model compared to other LLMs.

StableVicuna

StableVicuna-13B-HF represents an LLM model that has undergone meticulous fine-tuning through reinforcement learning from human feedback (RLHF).

White Papers

Products

MENU

Guanaco

InstructEval Models Explained,
Guanaco

100+ Technical Experts

50 Custom AI projects

4.8 Minimum Rating

An Overview of Guanaco

It has exhibited competitive performance with commercial chatbot systems on the Vicuna and OpenAssistant benchmarks.

Multilingualism

The model achieved 99% accuracy on the Vicuna chatbot benchmark, so it could answer 99% of the questions correctly.

Large Size

The model can be fine-tuned on a GPU with less than 48GB of memory, making it a cost-effective solution for firms.

Open-source

Model Details

Model Highlights

Training Details

Training Dataset

Training Procedure

Training Observation1

Training Observation2

Training Hyperparameters

Risks and Bias

Using the Model

To load Guanaco 7B in 4-bits, you can follow these steps:

Once the model has been successfully loaded and quantized, performing inference follows standard procedures commonly used with Hugging Face (HF) models.:

Expected output obtained similar to the below one:

It is important to note that 4-bit inference may exhibit slower performance. If speed is critical for your application, we recommend loading the model in 16 bits instead.

Other InstructEval Models

Falcon 7B Instruct

Alpaca LoRA

StableVicuna

White Papers

Products

MENU

InstructEval Models Explained,Guanaco

100+ Technical Experts

50 Custom AI projects

4.8 Minimum Rating

An Overview of Guanaco

It has exhibited competitive performance with commercial chatbot systems on the Vicuna and OpenAssistant benchmarks.

Multilingualism

The model achieved 99% accuracy on the Vicuna chatbot benchmark, so it could answer 99% of the questions correctly.

Large Size

The model can be fine-tuned on a GPU with less than 48GB of memory, making it a cost-effective solution for firms.

Open-source

Model Details

Model Highlights

Training Details

Training Dataset

Training Procedure

Training Observation1

Training Observation2

Training Hyperparameters

Risks and Bias

Using the Model

To load Guanaco 7B in 4-bits, you can follow these steps:

Once the model has been successfully loaded and quantized, performing inference follows standard procedures commonly used with Hugging Face (HF) models.:

Expected output obtained similar to the below one:

It is important to note that 4-bit inference may exhibit slower performance. If speed is critical for your application, we recommend loading the model in 16 bits instead.

Other InstructEval Models

Falcon 7B Instruct

Alpaca LoRA

StableVicuna

InstructEval Models Explained,
Guanaco