CodeT5

Code LLMs Explained,
CodeT5

CodeT5, developed by Salesforce Research, is a Transformer model that improves code understanding and generation using developer-assigned identifiers. Their method includes a pre-training task for distinguishing code tokens that are identifiers, as well as a dual-generation task that uses user-written code comments. Experiments show that CodeT5 outperforms previous methods on various understanding and generation tasks and better captures code semantics.

Model Details View All Models

Technical Experts

50 Custom AI projects

4.8 Minimum Rating

An Overview of CodeT5

CodeT5 leverages the power of large-scale pre-training on code data, combined with fine-tuning on downstream code-related tasks, to improve the accuracy and efficiency of code-related applications.

SOTA results on the 14 sub-tasks in CodeXGLUE.

SOTA results 14 sub-tasks

The research paper on CodeT5 shows that it yields state-of-the-art results on the fourteen sub-tasks in CodeXGLUE.

CodeT5 is a newly developed encoder-decoder model

8.35 million Functions

CodeT5 is a recently developed encoder-decoder model designed for programming languages, and it has been pre-trained on a dataset of 8.35 million functions

CodeT5 achieves over 99% F1 for all PLs for identifier tagging.

Over 99% F1 for all PLs

Researchers also identify the identifier tagging performance and find it achieves over 99% F1 for all PLs, showing that CodeT5 can confidently distinguish identifiers in code.

Blockchain Success Starts here

  • About Model

  • Model Highlights

  • Training Details

  • Model Types

  • Key Results

  • Model Features

  • Model Tasks

  • Fine-tuning

  • Benchmark Results

  • Sample Codes

  • Limitations

  • Other LLMs

ModelParametersHighlight
CodeT5-small60 millionSmaller and faster than the original CodeT5 model, making it more efficient and easier to deploy on resource-constrained devices.
Dual-genVariesDesigned to generate code from both natural language and code inputs, allowing for tasks such as code completion with partial code input.
Multi-taskVariesCan perform multiple code-related tasks simultaneously, allowing for more efficient and effective learning and generalization across tasks.
TaskDatasetScore
code summarizationBLEU-419.77
Code generationBLEU41.48
Code generationCodeBLEU44.1
Code generationEM22.7
Code translation (Java to C#)BLEU-484.03
Code translation (C# to Java)BLEU-479.87
code refine (medium)BLEU-487.64
code defect detectionPLBART65.78
code clone detectionPLBART97.2