Friends & family know—I love me some R. It’s not just my tattoo of co-founders Robert Gentleman & Ross Ihaka (Fig. 1), although that probably is a contributing factor, it’s the fact that I’m constantly talking about the open source stats. language, extolling its virtues from the highest to the lowest points of OHSU (5th floor of the BICC, & when the administration realized the Supreme Court was serious about removing the tort cap, respectively). Scientific research & clinical practice moves forward on a wave of statistical knowledge, & I believe the name of the frigate on which we ought to ride that wave is R.
Figure 1. My R-inspired tattoo
R is a highly divisive software package—people either love it or hate it. Most statistical tools, I’ve noticed, engender a high level of loyalty on the part of their users, & R is no exception. I’m going to go out on a controversial limb here & say that too many scientists choose their analytical methods based on tradition.
“It’s the standard of the field.”
“It’s what my mentor used to do.”
“It’s the only button I understand, in SPSS.”
To a point, these are all valid arguments. Often, the standard of the field is the standard of the field for a reason–time has shown it to be the best known way of doing science. But science is continually changing: our data sets get bigger & bigger, & the effect sizes we’re attempting to measure get smaller & smaller. Maybe it’s just my seasonally-induced, dangerously-low levels of Vitamin D, but I like to imagine that every scientist reaches a certain point in their career that mimics the start of (literally) all natural disaster movies:
It’s late at night.
A lone be-whitecoated scientist sits at a computer, his face alit by the glow of his computer monitor. For the sake of visualization, we’ll say this scientist is played by Jeff Goldblum.
Jeff squints, & ever-so-slightly shakes his head.
“That’s funny,” Jeffey mutters to nobody in particular.
The camera does a slow 180-degree rotation around ‘Blumers, & the audience is able to make out the computer monitor. J-Train is analyzing a large data set using the same procedures he’s always used on very small ones. He’s just performed 45,000 uncorrected t-tests*, &, apparently, discovered that, for some questions, p can literally be equal to 0.
He raises his shaking fist to the paneled ceiling & shouts, “There must be a better way!”
& there is! But long & complex is the road leading away from cookie cutter statistics. Many will be lost during the journey. Some to sub-standard statistical education. Some to difficulty learning a new statistics package. Perhaps some to gout. It is only by truly understanding our data that it can be analyzed, & there is no one way that this can be done–it changes with each experiment! It is partly for this reason that “analyzing” data by blindly performing procedures is so dangerous–the analyst very well might be getting misled by their own pre-conceptions of their experiment. In my view, the only way to avoid this is by using a statistical package that doesn’t lock you in to buttons & drop-down menus of procedures. True, there is perhaps a time & a place for those things, but understanding data can only be accomplished with a language built specifically for that purpose. For me, that language is R.
*If this seems ok to you, please read this: http://en.wikipedia.org/wiki/Multiple_comparisons