Blog Posts as Email Newsletters
With the new blog2newsletter, you can have email subscribers to your blog controlled by R thanks to gmailr, googlesheets4, and tidyRSS.
{blog2newsletter}
package on GitHub (Flight, n.d.).
Measuring Changes in Height Over Time
I have a citizen science project I want to try, that involves individuals measureing their own height daily over a long period of two months. I think I figured out how to do it.
Creating an Analysis With a targets Workflow
How I work through an -omics analysis using the targets package.
{targets}
can be a little confusing without worked examples. This is my attempt to provide such an example.
Recreating Correlation Values from Another Manuscript
Documenting my journey trying to recreate some correlations calculated in another manuscript.
Dplyr in Packages and Global Variables
How to include dplyr in a package, and avoid warnings around global variables.
Reusing ggplot2 Colors
When you want to reuse ggplot2 default colors across plots.
Migrating Self-Hosted GitLab Projects to GitHub
We wanted to migrate from self-hosted GitLab projects to GitHub repos. Here is some background on how we accomplished that.
My Vacation in Barometric Pressure
What can we see from my phone’s barometric pressure readings?
Random Forest Classification Using Parsnip
How to make sure you get a classification fit and not a probability fit from a random forest model using the tidymodels framework.
Coloring Dendrogram Edges with ggraph
Here is how I got edges colored in a dendrogram with ggraph. Use “node.” in front of the node data column you want.
My Geographic Introduction
Adapting Piping Hot Data’s Geographic Introduction animation for myself.
Proportional Error in Mass Spectrometry
Demonstrating the existence of proportional error in mass spectrometry measurements.
Highlighting a Row of A ComplexHeatmap
A simple way to highlight or bring attention to a row or column in a ComplexHeatmap.
Creating a Map of Routes Weighted by Travel
I made a map of my spouse’s travel since we got Google phones for her birthday last fall. Here’s how I did it.
Packages Don’t Work Well for Analyses in Practice
I was wrong about using packages to structure statistical analyses. Also why I finally switched to {drake}.
{drake}
, use {targets}
now. Everything else still applies.
Things I Learned About distill
The various things I learned about the distill blog setup while converting posts over from my old blogdown site.
Using group_by Instead of Splits
How to use group_by instead of split’s to summarize things.
dplyr::group_by
and summarise
to find items that you might want to keep or remove based on a part_of the item or group in question. I used to use spli…
Introducing Scientific Programming
How and when should we get people in academia programming? What if we had a unified front across the science labs?
Comments enabled via utterances
How I got utterances working on blogdown.
Comparisons using for loops vs split
for loops often hide much of the actual logic of your code because of all the necessary boilerplate of running a loop. split-ting your data can oftentimes be clearer, and faster.
for
loops are useful, and sometimes they…
Don’t do PCA After Statistical Testing!
You might be tempted to do PCA after a statistical test. Read more to discover why this is a bad idea.
Finding Modes Using Kernel Density Estimates
Examples of finding the mode of a univeriate distribution in R and Python.
density
or Scipy’s gaussian_kde
to create density…
Split - Unsplit Anti-Pattern
Getting some speed using dplyr::join than my more intuitive split –> unsplit pattern.
split
-> unsplit
/ rbind
on two object to match items up, maybe you should be using dplyr::join_
instead. Read below for concrete examples.
Using IRanges for Non-Integer Overlaps
I wanted to make use of IRanges awesome interval logic, but for non-integer data.
IRanges
package implements interval algebra, and is very fast for finding overlaps of two ranges. If you have non-integer data, multiply values by a large constant factor and round them. The constant depends on how much…
knitrProgressBar Package
Ever wanted a progress bar output visible in a knitr document? Now you can!
dplyr
progress bars, and wished you could use them everywhere, including from within Rmd documents, non-interactive shells, etc, then you should check out knitrPr…
Licensing R Packages that Include Others Code
I wanted to include others code in my package, and couldn’t find any good resources.
docopt & Numeric Options
Every input is a string in docopt. Every Input!!
docopt
package to create command line R
executables that take options, there is something to know about numeric command line options: they should have as.double
before using them in your…
Linking to Manually Inserted Images in Blogdown / Hugo
This is my method to include something manually in a blogdown post.
blogdown
for generating websites and blog-posts from Rmarkdown
files with lots of inserted code and figures seems pretty awesome, but sometimes you want to include a figure manually, either because you want to generate something…
Differences in Posted Date vs sessionInfo()
If you see differences in the sessionInfo output and the date the post was published, this is why.
R
tutorials generally include the output of Sys.time()
at the…
Criticizing a Publication, and Lying About It
Critics of our last publication claimed we didn’t make our data available, which is an outright lie.
Authentication of Key Resources for Data Analysis
NIH is asking for authentication of key resources. How does this apply to data analyses?
Random Forest vs PLS on Random Data
Comparing random-forest and partial-least-squares discriminant-analysis on random data to show the problems inherent in PLS-DA.
Novel Zinc Coordination Geometries
A bit of an explainer on our labs recent publication on finding and classifying zinc coordination geometries in protein structures.
Mouse / Human Transcriptomics and Batch Effects
A recent paper dug into some data from another paper, casting doubts on the first, all thanks to the data being available.
First Open Post-Publication Peer Review, with Credit!
A story about my first open peer-review.
Being a PhD Student and Post-Doc with Migraines
What it’s like having migraines as a PhD student and PostDoc.
Travis-CI to GitHub Pages
How I automatically have some stuff get pushed to GitHub pages from a Travis CI job.
R
package vignettes…
Analyses as Packages
Why I think packages make good ways to structure an analysis.
Creating an Analysis as a Package and Vignette
A walkthrough creating an analysis project as a package.
R Job Notifications Using Twitter
R
to…
Researcher Discoverability
Why do we need corporate products to enhance “researcher discoverability”?
categoryCompare Paper Finally Out!
My first first author publication since starting my PostDoc is finally out, about my meta-annotation-enrichment software package categoyrCompare.
Bioconductor
package categoryCompare
is finally published in the Bioinformatics and Computational Biology section of Frontie…
Self-Written Function Help
Do you want to be able to read function documentation for your own functions? Make your own package.
Python
’s docStrings are so great, and wouldn’t it be nice to have a similar system in R
. Especially when you can have your new function tab completion available depending on your development…
Package Version Increment Pre- and Post-Commit Hooks
Two git commit hooks for incrementing the package version as part of commits.
PubmedCommons API
Pubmed commons is a new commenting system for pubmed articles.
Open vs Closed Analysis Languages
Talking about R & Python vs MatLab as examples of open and closed data analysis languages.
R
and python
because they are open in the sense that anyone can obtain them, use them and modify them for free, and this has lead to large, robust groups of users, making it more likely that…
Pre-Calculating Large Tables of Values
Demonstrating a way to generate a large amount of numbers that otherwise might take a long time to calculate.
Portable, Personal Packages
My take on creating simple little packages for your own commonly used functions.
R
functions that are used often as a gist
on Github, and source
ing that file at the beginning of R
analysis scripts. There is nothing inherently wrong with this, but it does end up cluttering the user workspace, and there is no real documentation on the functions, and no…
R, RStudio, and Release and Dev Bioconductor
Working with the development version of Bioconductor on linux can be a pain. This is one way to do it.
Reproducible Methods
A short missive on reproducibility, especially within computational work.
Writing Up Scientific Results and Literate Programming
My thoughts on using literate programming to investigate and report scientific results
Writing Papers Using R Markdown
How I used RMarkdown to write a manuscript
RStudio
and knitr
for a while, and have even been using Rmd
(R markdown) files in my own work as a…
Creating Custom CDFs for Affymetrix Chips in Bioconductor
Examples of messing with Affymetrix CDF data in Bioconductor.