I woke up this morning to an email saying my first R package, holodeck, was on it’s way to CRAN! It’s a humble package, providing a framework for quickly slapping together test data with different degrees of correlation between variables and differentiation among levels of a categorical variable.
# Example use of holodeck library(holodeck) library(dplyr) df <- #make a categorical variable with 10 observations and 3 groups sim_cat(n_obs = 10, n_groups = 3, name = "Treatment") %>% #add 3 variables that covary sim_covar(n_vars = 3, var = 1, cov = 0.
Have you ever pondered whether a muffin is really a breakfast food and not just an excuse to eat cake first thing in the morning? Well, you’ve come to the right blog post! In a previous post, I explained how I created a dataset of the ingredients of 269 cupcake and muffin recipes. In this installment, I’m going to use that dataset to demonstrate some of the important properties of multivariate statistics, specifically the difference between principal component analysis (PCA) and partial least squares regression (PLS).
This was my first time attending RStudio::conf, and I went primarily to explore my career options in data science. I mainly stuck to teaching and modeling related talks since that’s how I already use R. Here are my major takeaways from the conference.
Shiny is the new hotness Shiny apps are interactive web apps that run on R code, and there was a big focus on Shiny development at the conference this year.
I recently gave a talk on some of my work as a PhD student on experiments manipulating densities of the tea green leafhopper (Empoasca onukii) on tea plants. What the audience liked most, I think, were my methods for finding leafhopper eggs in the field and rearing them in the lab (well, a guest room at a tea farm). You see, leafhoppers (including at least the tea green leafhopper and the small green leafhopper, Empoasca vitis) lay their eggs inside plant tissues, making them impossible to find with the naked eye.
I’m currently in Hangzhou, China at the Tea Research Institute(TRI) for my fourth and last time. It’s bitter sweet (like my favorite teas ;-) ) since I’m both glad to be nearing the end of my PhD, and sad to say goodbye to all the friends I’ve made and a city I’ve really grown to enjoy living in.
Fieldwork This final summer, I’ve been focusing on a few experiments having to do with leafhoppers and their effects on tea chemistry (see the project page for more info).
My PhD has involved learning a lot more than I expected about analytical chemistry, and as I’ve been learning, I’ve been trying my best to make my life easier by writing R functions to help me out. Some of those functions have found a loving home in the webchem package, part of rOpenSci.
Papers that use gas chromatography to separate and measure chemicals often include a table of the compounds they found along with experimental retention indices and literature retention indices.
Last semester I took a class that used Python. It was my first time really seriously using any programing language other than R. The students were about half engineers and half biologists. The vast majority of the biologists knew R to varying degrees, but had no experience with Python, and the engineers seemed to generally have some experience with Python, or at least with languages more similar to it than R.
I know you’re all waiting on the edge of your seats for an update on the cupcakes vs. muffins data science project, but unfortunately I don’t have any answers to that age-old question* yet.
As silly as it may sound, I’m actually considering using this data set for a paper about using PLS (partial least squares regression) for ecological data. So for now, I’m holding off on blogging about any results of analyses in case I end up wanting to use them for the publication.
One thing I’ve learned from my PhD at Tufts is that I really enjoy working data wrangling, visualization, and statistics in R. I enjoy it so much, that lately I’ve been strongly considering a career in data science after graduation. As a way to showcase my data science skills, I’ve been working on a side project to use webscraping and multivariate statistics to answer the age old question: Are cupcakes really that different from muffins?