FastRNABindR: Fast and Accurate Prediction of Protein-RNA Interface Residues

EL-Manzalawy, Yasser; Abbas, Mostafa; Malluhi, Qutaibah; Honavar, Vasant

Author	EL-Manzalawy, Yasser
Author	Abbas, Mostafa
Author	Malluhi, Qutaibah
Author	Honavar, Vasant
Available date	2016-10-18T10:07:41Z
Publication Date	2016-07-06
Publication Name	PLoS ONE
Identifier	http://dx.doi.org/10.1371/journal.pone.0158445
Citation	EL-Manzalawy Y, Abbas M, Malluhi Q, Honavar V (2016) FastRNABindR: Fast and Accurate Prediction of Protein-RNA Interface Residues. PLoS ONE 11(7): e0158445.
URI	http://hdl.handle.net/10576/4900
Abstract	A wide range of biological processes, including regulation of gene expression, protein synthesis, and replication and assembly of many viruses are mediated by RNA-protein interactions. However, experimental determination of the structures of protein-RNA complexes is expensive and technically challenging. Hence, a number of computational tools have been developed for predicting protein-RNA interfaces. Some of the state-of-the-art protein-RNA interface predictors rely on position-specific scoring matrix (PSSM)-based encoding of the protein sequences. The computational efforts needed for generating PSSMs severely limits the practical utility of protein-RNA interface prediction servers. In this work, we experiment with two approaches, random sampling and sequence similarity reduction, for extracting a representative reference database of protein sequences from more than 50 million protein sequences in UniRef100. Our results suggest that random sampled databases produce better PSSM profiles (in terms of the number of hits used to generate the profile and the distance of the generated profile to the corresponding profile generated using the entire UniRef100 data as well as the accuracy of the machine learning classifier trained using these profiles). Based on our results, we developed FastRNABindR, an improved version of RNABindR for predicting protein-RNA interface residues using PSSM profiles generated using 1% of the UniRef100 sequences sampled uniformly at random. To the best of our knowledge, FastRNABindR is the only protein-RNA interface residue prediction online server that requires generation of PSSM profiles for query sequences and accepts hundreds of protein sequences per submission. Our approach for determining the optimal BLAST database for a protein-RNA interface residue classification task has the potential of substantially speeding up, and hence increasing the practical utility of, other amino acid sequence based predictors of protein-protein and protein-DNA interfaces.
Sponsor	Edward Frymoyer Endowed Professorship in Information Sciences and Technology. The Center for Big Data Analytics and Discovery Informatics which is co-sponsored by the Institute for Cyberscience, the Huck Institutes of the Life Sciences, the Social Science Research Institute, and the College of Information Sciences and Technology at the Pennsylvania State University. NPRP grant No. 4-1454-1-233 from the Qatar National Research Fund (a member of Qatar Foundation).
Language	en
Publisher	Public Library of Science (PLoS)
Subject	Sequence database BLAST algorithm Database searching Amino acid sequence analysis database and informatics methods
Title	FastRNABindR: Fast and Accurate Prediction of Protein-RNA Interface Residues
Type	Article
Issue Number	7
Volume Number	11
ESSN	1932-6203
dc.accessType	Open Access

Files in this item

Name:: journal.pone.0158445.PDF
Size:: 1000.Kb
Format:: PDF
Description:: Open Access Version of Record ...

View/Open

Name:: 3911904.zip
Size:: 80.51Kb
Format:: Unknown
Description:: Supplementary information

View/Open

This item appears in the following Collection(s)

Computer Science & Engineering [‎2427‎ items ]
Interdisciplinary & Smart Design [‎15‎ items ]

Show simple item record

FastRNABindR: Fast and Accurate Prediction of Protein-RNA Interface Residues

Files in this item

This item appears in the following Collection(s)

Video