Machine Learning, python, research, resources

Deep Learning for Protein Function Prediction

Protein function prediction is taking information about a protein (such as its amino acid sequence, 2D and 3D structure etc.) and trying to predict which functions it will exhibit. This has implications in several areas of bioinformatics and affects how drugs are created and diseases are studied. This is typically an intensive task requiring inputs from biologists and computer experts alike and annotating newly found proteins requires empirical as well as computational results.

We, here at FAST NU, recently came up with a unique method (dubbed DeepSeq — since it’s based on Deep Learning and works on protein sequences!) for predicting functions of proteins using only the amino acid sequences. This is the information that is the first bit we get when a new protein is found and is thus readily available. (Other pieces require a lot more effort.)

We have successfully applied DeepSeq to predict protein function from sequences alone without requiring any input from domain experts. The paper isn’t peer reviewed yet but we have made the paper available as preprint and our full code on github so you can review it yourself.

We believe DeepSeq is going to be a breakthrough inshaallah in the field of bioinformatics and how function prediction is done. Let’s see if I can come up with an update about this in a year after the paper has been read a few times by domain experts and we have a detailed peer review.