Our paper, Using Large Language Models to Generate JUnit Tests: An Empirical Study, got accepted for the 28th International Conference on Evaluation and Assessment in Software Engineering (EASE 2024) in the research track. In this work, we analyzed three models with different prompting techniques to generate JUnit tests for the HumanEval dataset and real-world software. We evaluated the LLMs’ generated tests using compilation rates, test correctness, test coverage, and test smell. We found that though the models have higher coverage for small programming problems from the HumanEval dataset, they lack good coverage for real-world software from the Evosuite dataset.
Subscribe to this blog via RSS.
Paper 9
Research 9
Tool 2
Llm 6
Dataset 2
Paper (9) Research (9) Tool (2) Llm (6) Dataset (2) Qualitative-analysis (1)