The volume of source code available on the Internet is astronomical. When seeking to detect cases of plagiarism, one must maintain a large database of known documents. This can lead to unacceptably slow runtimes for systems designed to detect cases of source code plagiarism. We seek to use partitional and density-based clustering as well as intelligent parallelism to improve VOCS, a plagiarism detection system. In addition, we will attempt to increase the system’s usability and usefulness by expanding its programming language support and building an intuitive web interface. Finally, we propose utilizing Program Dependence Graphs to construct a hybrid approach in order to more accurately and precisely detect well-disguised plagiarism.
Ohmann, Anthony, "Efficient Clustering-based Plagiarism Detection using IPPDC" (2013). Honors Theses, 1963-2015. 14.