development

Split - Unsplit Anti-Pattern

TL;DR If you notice yourself using split -> unsplit / rbind on two object to match items up, maybe you should be using dplyr::join_ instead. Read below for concrete examples. Motivation I have had a lot of calculations lately that involve some sort of normalization or scaling a group of related values, each group by a different factor. Lets setup an example where we will have 1e5 values in 10 groups, each group of values being normalized by their own value.

docopt & Numeric Options

TL;DR If you use the docopt package to create command line R executables that take options, there is something to know about numeric command line options: they should have as.double before using them in your script. Setup Lets set up a new docopt string, that includes both string and numeric arguments. " Usage: test_numeric.R [–string=<string_value>] [–numeric=<numeric_value>] test_numeric.R (-h | –help) test_numeric.R Description: Testing how values are passed using docopt.

Analyses as Packages

TL;DR Instead of writing an analysis as a single or set of R scripts, use a package and include the analysis as a vignette of the package. Read below for the why, the how is in the next post. Analyses and Reports As data science or statistical researchers, we tend to do a lot of analyses, whether for our own research or as part of a collaboration, or even for supervisors depending on where we work.

Creating an Analysis as a Package and Vignette

Following from my last post, I am going to go step by step through the process I use to generate an analysis as a package vignette. This will be an analysis of the tweets from the 2012 and 2014 ISMB conference (thanks to Neil and Stephen for compiling the data). I will link to individual commits so that you can see how things change as we go along. Setup Initialization To start, we will initialize the package.