Posts

I’m currently working on a project where we want to know, based on a euclidian distance measure, what is the probability that the value is a match to the another value. i.e. given an actual value, and a theoretical value from calculation, what is the probability that they are the same? This can be calculated using a chi-square distribution with one degree-of-freedom, easily enough by considering how much of the chi-cdf we are taking up.

CONTINUE READING

ProgrammingR had an interesting post recently about keeping a set of R functions that are used often as a gist on Github, and sourceing that file at the beginning of R analysis scripts. There is nothing inherently wrong with this, but it does end up cluttering the user workspace, and there is no real documentation on the functions, and no good way to implement unit testing. However, the best way to have sets of R functions is as a package, that can then be installed and loaded by anyone.

CONTINUE READING

It seems that the PostDoc committee here at UK has an interest in providing some information on alternative careers for PostDocs outside of academia. We all know that there are not enough PI slots at universities for all of the PhDs who are going on to do PostDocs, and many will end up in alternative (I use that term loosely) careers. Yesterday (2013-09-18), Scott Diamond gave a seminar to PostDocs in the Medical Center at UK on getting involved in teaching science at the K-12 and college level.

CONTINUE READING

I have one Bioconductor package that I am currently responsible for. Each bi-annual release of Bioconductor requires testing and squashing errors, warnings and bugs in a given package. Doing this means being able to work with multiple versions of R and multiple versions of Bioconductor libraries on a single system (assuming that you do production work and development on the same machine, right?). I really, really like RStudio as my working R environment, as some of you have read before.

CONTINUE READING

Science is built on the whole idea of being able to reproduce results, i.e. if I publish something, it should be possible for someone else to reproduce it, using the description of the methods used in the publication. As biological sciences have become increasingly reliant on computational methods, this has become a bigger and bigger issue, especially as the results of experiments become dependent on independently developed computational code, or use rather sophisticated computer packages that have a variety of settings that can affect output, and multiple versions.

CONTINUE READING

Kaitlin Thaney asked on Twitter last week about using Ramnath Vaidyanathan’s new interactive R notebook 1 2 for teaching. Now, to be clear up front, I am not trying to be mean to Ramnath, discredit his work, or the effort that went into that project. I think it is really cool, and has some rather interesting potential applications, but I don’t really think it is the right interface for teaching R.

CONTINUE READING

Inspired by this post, I wanted to examine the locations and density of Tim Hortons restaurants in Canada. Using Stats Canada data, each census tract is queried on Foursquare for Tims locations. Setup options(stringsAsFactors=F) require(timmysDensity) require(plyr) require(maps) require(ggplot2) require(geosphere) Statistics Canada Census Data The actual Statistics Canada data at the dissemination block level can be downloaded from here. You will want to download the Excel format, read it, and then save it as either tab-delimited or CSV using a non-standard delimiter, I used a semi-colon (;).

CONTINUE READING

If you do R package development, sometimes you want to be able to store variables specific to your package, without cluttering up the users workspace. One way to do this is by modifying the global options. This is done by packages grDevices and parallel. Sometimes this doesn’t seem to work quite right (see this issue for example. Another way to do this is to create an environment within your package, that only package functions will be able to see, and therefore read from and modify.

CONTINUE READING

As an academic researcher, my primary purpose is to find some new insight, and subsequently communicate this insight to the general public. The process of doing this is traditionally thought to be: from observations of the world, generate a hypothesis design experiments to test hypothesis analyse results of the experiments to determine if hypothesis correct write report to communicate results to others (academics and / or general public) And then repeat.

CONTINUE READING

I have been watching the activity in RStudio and knitr for a while, and have even been using Rmd (R markdown) files in my own work as a way to easily provide commentary on an actual dataset analysis. Yihui has proposed writing papers in markdown and posting them to a blog as a way to host a statistics journal, and lots of people are now using knitr as a way to create reproducible blog posts that include code (including yours truly).

CONTINUE READING