Posts

The Data Science Debate

Written for STAT 503 after reading sections 2.1-2.2 of ISLR and these two blog posts by Vincent Granville. To begin, I am going to be very honest and say that I am thoroughly bored of the whole ”Is data science statistics?

Divide and Recombine or Split-Apply-Combine?

Written for STAT 585X after going through the datadr tutorial by Ryan Hafen and others The concept that datadr was built upon, divide and recombine, is essentially the same as the concept behind plyr, split-apply-combine.

The Importance of Tidying Data

Written for Stat 585X after reading Tidy Data by Hadley Wickham Hadley Wickham, having gained a quasi god-like amount of knowledge about data cleaning and analysis throughout his statistical lifetime thus far, wrote the “Tidy Data” paper to enlighten those of us who are mere mortals of the statistics world on one aspect of data cleaning: data tidying.

A Brief History of Statistical Computing (as told through a series of corny historical references)

Written for Stat 585X after reading From S to R from AT&T Labs, Inc. Before the development of S, a time period which I will now refer to as the “Computational Dark Ages,” data analysis was really just statisticians performing the most basic of computations, like regression, and about half of their efforts were applied to the brutal task of programming.