Efficient plagiarism detection for large code repositories
Citations Over TimeTop 10% of 2006 papers
Abstract
Abstract Unauthorized re‐use of code by students is a widespread problem in academic institutions, and raises liability issues for industry. Manual plagiarism detection is time‐consuming, and current effective plagiarism detection approaches cannot be easily scaled to very large code repositories. While there are practical text‐based plagiarism detection systems capable of working with large collections, this is not the case for code‐based plagiarism detection. In this paper, we propose techniques for detecting plagiarism in program code using text similarity measures and local alignment. Through detailed empirical evaluation on small and large collections of programs, we show that our approach is highly scalable while maintaining similar levels of effectiveness to that of the popular JPlag and MOSS systems. Copyright © 2006 John Wiley & Sons, Ltd.
Related Papers
- → Academic Source Code Plagiarism Detection by Measuring Program Behavioral Similarity(2021)53 cited
- → Detecting Pervasive Source Code Plagiarism through Dynamic Program Behaviours(2020)13 cited
- → Evaluating the robustness of source code plagiarism detection tools to pervasive plagiarism-hiding modifications(2021)3 cited
- → Evaluating the robustness of source code plagiarism detection tools to\n pervasive plagiarism-hiding modifications(2021)