0 citations0 references

Evaluating Large Language Models in Class-Level Code Generation

2024pp. 1–13

Citations Over TimeTop 1% of 2024 papers

Xueying Du, Mingwei Liu, Kaixin Wang, Hanlin Wang, Liu Jun-wei, Yixuan Chen, Jiayi Feng, Chaofeng Sha, Xin Peng, Yiling Lou

Abstract

Recently, many large language models (LLMs) have been proposed, showing advanced proficiency in code generation. Meanwhile, many efforts have been dedicated to evaluating LLMs on code generation benchmarks such as HumanEval. Although being very helpful for comparing different LLMs, existing evaluation focuses on a simple code generation scenario (i.e., function-level or statement-level code generation), which mainly asks LLMs to generate one single code unit (e.g., a function or a statement) for the given natural language description. Such evaluation focuses on generating independent and often small-scale code units, thus leaving it unclear how LLMs perform in real-world software development scenarios.

Related Papers

→ Evaluation in the context of natural language generation(1998)67 cited
Context-aware Natural Language Generation for Spoken Dialogue Systems.(2016)
Evaluation in Natural Language Generation: Lessons from Referring Expression Generation(2007)
→ Natural Language Generation with Vocabulary Constraints(2014)8 cited
Affective Natural Language Generation(1999)