Preferably 4 but may take more based on SOPs
This project aims to develop a source code plagiarism detector using Python.
First step will be to implement a basic bag of words approach by file parsing. After that, some language specific preprocessing like renaming of variables, usage of macros, etc. can be integrated to improve accuracy on a specific language (like c++). Based on the results, we will further add some machine learning techniques like k-nearest neighbours to further improve the results.
This project can also be extended to compare source codes with online available codes using Google Search API (for example) if time permits.
We expect the students to go through some of the references mentioned and do some research of their own and include some of their ideas related to the project topic in their proposals. More importantly, we look for enthusiasm in students which will be judged by the effort they put in their proposals.
|Week 1||Learn basics of Python and file parsing|
|Week 2||Implement a basic bag of words approach|
|Week 3||Add some preprocessing specific to language syntax|
|Week 4||Integrate KNN|
|Week 5||Final touch and presentation|
|Bonus||In case we meet deadlines earlier than planned, we can integrate Google Search API to search on online available codes.|
|1||(4th April) - Implement bag of words with similarity percentage|
|Rest||Same as week schedule|