Seasons Of Code

PyRated    • Vibhav Aggarwal, Tulip Pandey   

WnCC - Seasons of Code

Seasons of Code is a programme launched by WnCC along the lines of the Google Summer of Code. It provides one with an opprtunity to learn and participate in a variety of interesting projects under the mentorship of the very best in our institute.


List of Running Projects

PyRated

PyRated

This project aims to develop a source code plagiarism detector using Python.

First step will be to implement a basic bag of words approach by file parsing. After that, some language specific preprocessing like renaming of variables, usage of macros, etc. can be integrated to improve accuracy on a specific language (like c++). Based on the results, we will further add some machine learning techniques like k-nearest neighbours to further improve the results.

This project can also be extended to compare source codes with online available codes using Google Search API (for example) if time permits.

We expect the students to go through some of the references mentioned and do some research of their own and include some of their ideas related to the project topic in their proposals. More importantly, we look for enthusiasm in students which will be judged by the effort they put in their proposals.

References:

  1. Bag of words approach:
    • https://machinelearningmastery.com/gentle-introduction-bag-words-model/
    • https://stackabuse.com/python-for-nlp-creating-bag-of-words-model-from-scratch/
  2. Basic Python tutorial: https://www.w3schools.com/python/python_intro.asp
  3. File parsing using Python: https://stackabuse.com/read-a-file-line-by-line-in-python/
  4. KNN: https://towardsdatascience.com/machine-learning-basics-with-the-k-nearest-neighbors-algorithm-6a6e71d01761
  5. A research paper on this approach: http://ijmlc.org/papers/50-A243.pdf

Timeline:

Week Number Tasks to be Completed
Week 1 Learn basics of Python and file parsing
Week 2 Implement a basic bag of words approach
Week 3 Add some preprocessing specific to language syntax
Week 4 Integrate KNN
Week 5 Final touch and presentation
Bonus In case we meet deadlines earlier than planned, we can integrate Google Search API to search on online available codes.

Checkpoints:

Checkpoint Number Progress
1 (4th April) - Implement bag of words with similarity percentage
Rest Same as week schedule