codebert

Code LLMs Explained,
CodeBERT

CodeBERT is a pre-trained model developed by Microsoft Research, designed to understand and generate code in multiple programming languages as well as natural language text. It is based on the BERT (Bidirectional Encoder Representations from Transformers) architecture, which is a transformer-based model known for its success in natural language understanding tasks.

Model Details View All Models

Technical Experts

50 Custom AI projects

4.8 Minimum Rating

An Overview of CodeBERT

CodeBERT is a pre-trained model developed by Microsoft Research, designed to understand and generate code in multiple programming languages and natural language text.

CodeBERT achieved a mean average precision (MAP) of 0.428

Achieved 0.428 MAP

On a dataset of 100K Java methods, CodeBERT achieved a mean average precision (MAP) of 0.428, significantly higher than the previous state-of-the-art model.

Achieves SOTA performance code search and documentation.

SOTA performance

CodeBERT achieves SOTA performance in both natural language code search and code documentation generation, according to the results.

The first NL-PL model for 6 programming languages

6 programming languages

CodeBERT is the first large NL-PL pretrained model for multiple programming languages. On NLPL probing, the results show that CodeBERT outperforms previous pre-trained models.

Blockchain Success Starts here

  • About Model

  • Model Highlights

  • Training Details

  • Model Types

  • Key Results

  • Model Features

  • Model Tasks

  • Fine-tuning

  • Benchmark Results

  • Sample Codes

  • Limitations

  • Other LLMs

Model ParametersHighlight
CODEBERT (RTD)Replaced Token Detection (RTD) Replaces tokens with others from the same vocabulary to learn more contextual information and relationships.
CODEBERT (MLM) Masked Language Modeling (MLM) Randomly masks some tokens to predict them from the context of remaining unmasked tokens.
CODEBERT (MLM+RTD) Both Masked Language Modeling and RTD Combines MLM and RTD approaches for more effective learning and better performance on downstream tasks.
Task DatasetScore
natural language code retrievalCodeSearchNet76
PL ProbingCodeSearchNet85.66
PL Probing with PRECEDING CONTEXT ONLYCodeSearchNet59.12
NL probingCodeSearchNet74.53
Code-to-Documentation generationBLEU-4 17.83
Code-to-NLBLEU22.36