Robert M Flight's home on the web
I just came back from our Bioinformatic group (a rather loose association of various researchers at UofL interested in and doing bioinformatics) journal club, where we discussed this recent paper:
Besides the catchy title that makes one believe that perhaps Google is getting into cancer research (maybe they are and we don't know it yet), there were some interesting aspects to this paper.
The premise is that they can combine gene expression data and network data to find better associations between gene expression data and a particular disease endpoint. The way this is carried out is through the use of the TRANSFAC transcription factor - gene target database for the network, the correlation of the gene expression with the disease status as the importance of a gene with the disease, and the Google PageRank as the means to transfer the network knowledge to the gene expression data. They call their method NetRank.
Note that the general idea had already been tried in this paper on GeneRank.
Rank the genes with disease status (poor or good prognosis) using a method (SAM, t-test, fold-change, correlation, NetRank). Pick n top genes, and develop a predictive model using a support vector machine. Wash, rinse, repeat several times to find the best set, varying the number of top genes, and the number of samples used in the training set.
For NetRank, the top genes were decided by using a sub-optimization based on varying d, the dampening factor in the PageRank algorithm that determines how much information can be transferred to other genes. The best value of d determined in this study was 0.3.
All other methods used just the 8000 genes that passed filtering, but NetRank used all the genes on the array, with those that were filtered out had their initial correlations set to 0, so that they were still in the network representation.
From the paper, it appears to have worked. Using a monte-carlo cross-validation, they were able to achieve over 70% prediction rates. And this was better than any of the other methods they used to associate genes with the disease, including SAM, t-test, fold-change, and raw correlations.
As we discussed the article, some questions did come up.
Find this post online at: http://robertmflight.blogspot.com/2012/08/journal-club-150812.html
Authored using Markdown, and the R Markdown package. Published on 15.08.12