Protein function prediction is taking information about a protein (such as its amino acid sequence, 2D and 3D structure etc.) and trying to predict which functions it will exhibit. This has implications in several areas of bioinformatics and affects how drugs are created and diseases are studied. This is typically an intensive task requiring inputs from biologists and computer experts alike and annotating newly found proteins requires empirical as well as computational results.
We, here at FAST NU, recently came up with a unique method (dubbed DeepSeq — since it’s based on Deep Learning and works on protein sequences!) for predicting functions of proteins using only the amino acid sequences. This is the information that is the first bit we get when a new protein is found and is thus readily available. (Other pieces require a lot more effort.)
We have successfully applied DeepSeq to predict protein function from sequences alone without requiring any input from domain experts. The paper isn’t peer reviewed yet but we have made the paper available as preprint and our full code on github so you can review it yourself.
We believe DeepSeq is going to be a breakthrough inshaallah in the field of bioinformatics and how function prediction is done. Let’s see if I can come up with an update about this in a year after the paper has been read a few times by domain experts and we have a detailed peer review.
I taught an introductory Machine Learning course to BS students at FAST Peshawar in Fall 2015. The feedback was quite positive so I decided to offer another course to the MS/PhD students in the next semester. The mode of teaching was also a bit different: we tried doing the pen-tablet-augmented-multimedia-slides model. The semester is still in progress but we have the core of the basics done now.
The lectures are in Urdu so might be easier to follow for those who understand the language. I will be uploading the future videos as they come up inshaallah. You can see the first video below and follow the complete collection on Vimeo here: https://vimeo.com/album/3770825
I started with Machine Learning a while back and had a slightly hard time getting help from the local community. The reason was mostly because the Machine Learning community in general is way behind the state-of-the-art in industry and research. This is true for almost all fields nowadays but with Machine Learning, the issues are more pronounced due to the recent fast-paced developments in the industry.
On the other hand, once you know what to study, things are much easier than many other fields such as security. Here I would outline the plan I followed to get to where I am (which isn’t too far ahead but still a little better than what most people know, IMHO).
So, here’s my guide for getting started with Machine Learning self-study.
- Start with Andrew Ng’s Coursera course — Machine Learning. That’s the advice almost everyone seems to give — and it’s a great advice. The Coursera course is completely basic and eases you in the field with little pre-reqs and not much depth. Be careful though: do not think after completing the course that you are an expert in Machine Learning. It misses quite a few areas and the skills needed to be above average. It does get you started with practicals so you are likely to think you’re already done after finishing the course.
- So, after you complete the courser in its entirety — including the assignments — I suggest you start with Prof. Nando de Freitas’ undergrad course. This is a much more detailed course and would get you a very different view of ML than traditional outlines. Of course, you might have to brush up on your Probability, Calculus and Linear Algebra. You can’t really do anything without these three.
- For the above three, I suggest the following courses:
- Probability: Probability for Life Sciences by UCLA’s Math Department. You can find videos for this easily.
- Calculus: I strongly suggest you go with Virtual University Pakistan’s Calculus-I course by Dr. Faisal Shah Khan. It’s a great course but it’s in Urdu. If you don’t know Urdu, you can find your own series. Please let me know in the comments about great resources for this.
- Linear Algebra: Of course, this can only be done with Gilbert Strang’s Linear Algebra course from OCW.
- After that, you can start with the grad course and the second grad course by Prof. Nando de Freitas. Both have very detailed video lectures.
Of course, you also need to work with tools other than Matlab. I strongly suggest the python PyData stack. The full list would be:
That’s what I have till now. I might add more when I know more inshaallah.
I’ve just started with another Coursera course — this one about learning in general. The course is called Learning How to Learn: Powerful mental tools to help you master tough subjects. It’s actually a fairly easy going course, as far as I can see. The assignments and quizzes are fairly straight forward for the most part but the important bit is that the instructors share their life experiences about learning. I hope to be able to get through this course — I have enough ambition that I’ve even signed up for the paid “Signature Track” version of the course.
One important mental tool that I found really interesting is how to use the diffused thought model to get new ideas regarding difficult to solve problems. It’s best explained in the videos through Edison’s example: He would sit on his chair and let his hand hang on a side — while holding a few ball bearings in it. He would then relax and let his mind wander, drifting off towards sleep. The mind would shift to diffused thinking and would eventually find some new avenue to explore to help solve the issue at hand. This usually happens when you’re about to fall asleep — and that is where the ball bearings come into play. They would fall down creating a bit of a racket pulling him back from sleep so that he could grasp the fledging ideas and put them on paper. Cool trick!
Here’s a mini howto on backing up files on a remote machine using
rsync. It shows the progress while it does its thing and updates any remote files while keeping files on the remote end that were deleted from your local folder.
rsync -v -r --update --progress -e ssh /media/nam/Documents/ firstname.lastname@example.org:/media/nam/backup/documents/
/media/nam/Documents/ is the local folder and
/media/nam/backup/documents/ is the backup folder on the machine with IP
So, Admob was acquired a while ago by Google and it was recently announced that the publisher reports by Admob would no longer be available through the old APIs. Instead, they now have to be retrieved through the AdSense API — which is based on OAuth 2.0 and thus a real pain for those just getting started.
Turns out, the process is quite straight-forward but extremely poorly documented. You can go through the AdSense reporting docs, the Google API library and the OAuth 2.0 specs but you would soon be lost. After spending a couple of days decoding the requirements, I found out the bare-metal approach to accessing the stats. And here is how.