Plagiarism Detection Using Natural Language Processing Techniques

Main Article Content

Muhammad Ilyas
Nasreen Malik
Ahmad Bilal
Saad Razzaq
Fahad Maqbool
Qaisar Abbas


Now a day’s plagiarism became very common in many fields of life such as research and educational fields. Due to the advancement in plagiarism techniques adopted by plagiarists, it is very difficult to detect plagiarism accurately by the existing technique. Different features are observed while checking plagiarism such as syntactic, lexical, semantic, and structural features. This research explores new and modern plagiarism detection tasks especially text-based plagiarism detection including monolingual plagiarism detection. We proposed a four-stage novel framework for plagiarism detection. Natural Language Processing (NLP) is used by this framework instead of focusing on traditional string-matching approaches. The objective of this framework is to explore text similarity by the combination of two metrics as Skip-Gram and Dice Coefficient on the corpus-based approach. Furthermore, the deep meaning of the text is explored by the use of the Deep and sallow NLP technique. Our results conclude that Heavy revision is identifying easily through Deep NLP. Shallow NLP prepares text very well that is processed further easily. Word2vec results are close to simple Deep NLP methods but word2vec also highlight those document that may not be highlighted by other technique. Synonym and phrase changes are also captured through Deep NLP.

Article Details