Posts

TL;DR Partial least squares (PLS) discriminant-analysis (DA) can ridiculously over fit even on completely random data. The quality of the PLS-DA model can be assessed using cross-validation, but cross-validation is not typically performed in many metabolomics publications. Random forest, in contrast, because of the forest of decision tree learners, and the out-of-bag (OOB) samples used for testing each tree, automatically provides an indication of the quality of the model.

CONTINUE READING

TL;DR Currently available methods to discover metal geometries make too many assumptions. We were able to discover novel zinc coordination geometries using a less-biased method that makes fewer assumptions. These novel geometries seem to also have specific functionality. This work was recently published under an #openaccess license in Proteins Journal: Yao, S., Flight, R. M., Rouchka, E. C. and Moseley, H. N. B. (2015), A less-biased analysis of metalloproteins reveals novel zinc coordination geometries.

CONTINUE READING

TL;DR This 2014 PNAS paper by S. Lin et al (Lin et al., PNAS, 2014) that compares transcription of tissues between species has a flawed experimental design, where species is almost perfectly confounded with machine / lane on which the sequencing was done. Y. Golad and O. Mizrahi-Man have published a manuscript describing the confounding and the results of removing it. This was possible because the original authors supplied the information about which publically available files were used in the original analysis.

CONTINUE READING

TL;DR Reviewed Jason McDermott’s MDRPred paper on F1000Research!, where my review is posted along side the paper, with a DOI, completely in the open with my name attached. Was a pleasant experience, aided by the fact that Jason wrote a good paper. F1000Research! F1000Research! is a new publishing startup from F1000 that has a model of post-publication peer review, whereby upon submission the manuscript undergoes basic quality checks (no real editorial control), and then is published.

CONTINUE READING

A blog post on the Weecology group blog by Elita Baldridge on being a PhD student with fibromyalgia, and how they are working through that, caused me to pause and reflect on my experience as a PhD student and PostDoc with migraines. For those who haven’t read my blog, I do research in bioinformatics, specifically in transcriptomics and metabolomics. I spend almost all of my research hours in front of a computer writing code, generating plots, and trying to make sense of -omics level data.

CONTINUE READING

I don’t remember how I got on this, but I believe I had a recent twitter exchange with some persons (or saw it fly by) about pushing R package vignettes to the web after building and checking on travis-ci. Hadley Wickham pointed to using such a scheme to push the web version of his book after each update and the S3 deploy hooks on travis-ci. Deploying your html content to S3 is great, but given the availability of the gh-pages branch on GitHub, I thought it would be neat to work out how to deploy the html output from an R package vignette to the gh-pages branch on GitHub.

CONTINUE READING

TL;DR I don’t want to be a PI because I enjoy spending time with my family, and don’t think I can handle the stress of juggling multiple grants, people, and deadlines. I want to be a staff member in a group that affords relative autonomy, while providing some security. If I’m lucky enough, my current position will enable that. The Bad News If you keep up with the news in academia, this is a horrible time to be a postdoc (post-doctoral) or PI (principal investigator).

CONTINUE READING

TL;DR Instead of writing an analysis as a single or set of R scripts, use a package and include the analysis as a vignette of the package. Read below for the why, the how is in the next post. Analyses and Reports As data science or statistical researchers, we tend to do a lot of analyses, whether for our own research or as part of a collaboration, or even for supervisors depending on where we work.

CONTINUE READING

Following from my last post, I am going to go step by step through the process I use to generate an analysis as a package vignette. This will be an analysis of the tweets from the 2012 and 2014 ISMB conference (thanks to Neil and Stephen for compiling the data). I will link to individual commits so that you can see how things change as we go along. Setup Initialization To start, we will initialize the package.

CONTINUE READING

tl;dr Imposing a different structure than R packages for distributing R code is a bad idea, especially now that R package tools have gotten to the point where managing a package has become much easier. ProjectTemplate ?? My last two posts (1, 2) provided an argument and an example of why one should use R packages to contain analyses. They were partly motivated by trends I had seen in other areas, including the appearance of the package ProjectTemplate.

CONTINUE READING