A Novel Evaluation of Motif Detection in Protein Sequences of p53 and DNA Sequences of RHAG Gene using Big Data Analytic Techniques

Main Article Content

Raiha Tallat
Muhammad Farhan
Muhammad Munwar Iqbal
Yasir Saleem


Big data has attracted a broad spectrum of attention from researchers and data scientists. Huge batches of data when adequately processed using appropriate algorithms in accordance with the required output prove to be very fruitful in the process of distilling information related to business, health, mechanics and various other domains. Data, when provided with an interface in an interpretable form, is the key to acquiring knowledge from that data. In literature, graph visualization analysis is one of the appropriate techniques for data interpretation, especially in the case of genomics because genomic sequences are comprised of motifs, which can be best, understood and analyzed in graphical form. Research shows that previously graph motif detection has been performed via graph partitioning detection algorithms to retrieve recommendations for binding sites in case of DNA sequences and active site for enzymes in case of protein sequences. Motif detection in protein and DNA sequences, using a partitioned approach is intricate. This paper is based on the protein sequences of p53 known as the guardian of DNA and the RHAG gene sequence responsible for the mutation found in the Rh null system. Detection of the motif in the protein and DNA sequences is discovered by using the MM algorithm implemented in the Multiple EM for Motif Elicitation (MEME) tool for both protein and DNA sequences, and matches are found using the TOMTOM comparison technique. The desired motif is searched across the available thirteen databases like JASPER, Homo Sapiens, and so on. The shortest motif of width was found in all databases except the DAP database. The calculated results have an e-value of 2.05e. The mixture model used for the algorithm showed different processing times for DNA and Protein sequence analysis.

Article Details

How to Cite
Tallat, R., Farhan, M., Munwar Iqbal, M., & Saleem, Y. (2020). A Novel Evaluation of Motif Detection in Protein Sequences of p53 and DNA Sequences of RHAG Gene using Big Data Analytic Techniques. Technical Journal, 25(02), 110-120. Retrieved from https://tj.uettaxila.edu.pk/index.php/technical-journal/article/view/670