In this post, I go over the basics of running an ANOVA using R. The dataset I’ll be examining comes from this website, and I’ve discussed it previously (starting here and then here). I’ve not seen many examples where someone runs through the whole process, including ANOVA, post-hocs and graphs, so here we go.
This is also my first post over at wordpress.com. My old site will still exist with my old posts, it was just starting to churn a bit with so many visitors. It started to mess up images to, so I have decided to move!
One-way ANOVA
So here I’m going to ask the question: which class scored highest on DPS? I won’t be breaking this down by spec yet, that will come in future posts. The syntax for running an ANOVA is this:
aov(DV ~ IV, data=dataset)
It’s so simple! If you want to look at an interaction, or have more than one factor, go for:
aov(DV ~ IV1 * IV2, data=dataset)
Running the ANOVA
Let’s go for it:
btwn <- aov(DPS ~ class, data=full_list_dps) summary(btwn)
This shows us that there is a highly significant effect of class on the dps score. But how should we visualise it?
Visualising the Results
I’ve seen various guides offer different approaches to visualising ANOVA results. I researched quite a few before settling on ggplot2 to do this. I’ll begin by summarising the data:
graph_summary<-ddply(full_list_dps, c("class"), summarize,
AVERAGE=mean(DPS),
SE=sqrt(var(DPS)/length(DPS)))
This uses plyr which you can see details of in some previous posts I’ve made (e.g, here). For the graph that I will make, I’d like to have class ordered by dps. To do that, I run this code:
graph_summary$class <- reorder(graph_summary$class, graph_summary$AVERAGE)
It reorders the class column by the AVERAGE of each class.
Next up, we have the code for the graph:
ggplot(graph_summary)+
aes(x=class, y=AVERAGE, colour=class)+
geom_point()+
geom_errorbar(aes(ymax=AVERAGE+SE, ymin=AVERAGE-SE))+
opts(axis.text.x = theme_text(angle = 90, hjust = 0, size=11),
axis.title.x = theme_text(size=14),
axis.title.y = theme_text(angle = 90, size=14))+
scale_x_discrete("Class")+
scale_y_continuous("DPS (Mean)")
This then gives us the following:
Post-hocs
Next up are the post-hoc tests to run. You can run Tukey’s HSD tests with the simple command:
TukeyHSD(btwn)
However, if you’re more inclined towards t-tests, there’s a great function that can do all of them for you. It’s this:
with(full_list_dps, pairwise.t.test(DPS, class, p.adj="bonferroni", paired=F))
The with command just tells R to use full_list_dps so we don’t have to write full_list_dps$DPS and so on when running the t-test. The syntax for the pairwise t-test follows the following formula:
pairwire.t.test(DV, IV, ADJUSTMENT, PAIRED??)
DV and IV I assume you understand. There are lots of p-value adjustment options, and here I’ve gone for bonferroni. As these are not paired data, I’ve said paired to FALSE by writing in paired=F. What pairwise.t.test does is run every combination of t-test for you based on your factor levels. Great! You’ll get results like the following, showing you p-values from t-tests comparing all the factor levels with each other:
I’ve converted the output into a table and rounded the values to a few decimal places, but I find tables in this format a very easy way to check for significant effects.
References/Further Reading
The method I have presented here for averaging data and plotting it with error bars in ggplot2 was taken/inspired from this post over at the fantastically-named i’m a chordata! urochordata! blog. I have no idea where the blog title comes from–but it sounds cool. I’m indebted to that author/site because, well, it was the first decent example I found that did what I was after!
I’d also like to reference this for helping me work out how to convert pairwise t-test results into a table like the one pictured above.
See also the personality project pages on ANOVAs for further reading.


You may want to examine some of the graphics, including a dynamic one for 2w anova, in my package (w/ J. Helmreich) granova. (graphical analysis of variance).
We too have been impressed w/ ggplot2 so we have undertaken to convert several of our functions (granova.1w , and .2w; also granova.ds) to ggplot2 forms. Let me know if you are interested and I can send you paper (soon to be finished) that spells out the logic and documentation for these. Your work w/ ggplot2 is also interesting. Bob P.
Bob – that sounds really interesting! Getting graphics out to visualise effects and interactions is often tricky, and anything that can make it easier would be great. I’ll try out the package, and please feel free to send your paper across to me. My email is hayward [dot] godwin [at] soton [dot] ac [dot] uk. Thanks!
Hey,
Chordata and Urochordata are two names to describe the organisation of life, like Vertebrae, Mammals, etc.
A chordata is an animal with some kind of spine during any time of its life.
Wikipedia explains it much better than me : http://en.wikipedia.org/wiki/Chordate
I really liked those articles. I’m a Wow player myself, and a beginner with R and they helped me a lot to understand how some things work with R and the ggplot package too. I was struggling with some concepts.
Those data were something I could understand immediately, instead of having to understand data from a domain very far from mine (I’m not studying Wow data of course…).
Nefernet: thanks for the info, and the wikipedia page
Glad you liked the data and examples. It’s a shame there aren’t courses where people can study wow data :p I figured doing it on that would be more interesting than covering some of my own data, or something made up!
Instead of building own ggplot object (as you did) I prefer ggplot’s boxplot for quick result. On the other end, when I want a full picture I do not relay on averages+sd. In that case I try to plot all data with controlled transparency (alpha):
ggplot(data,aes(Hour,Volume,col=Hour))+geom_jitter(alpha=.65)
The result can be found here (5th chart):
http://data-analysis.eu/index.php/show-case-description
Cool, thanks for the link! I haven’t seen geom_jitter previously, but I’ll definitely play around with it. The main reason that I’ve set up the graphs in this manner is that they’re the standard for my field – and I agree with you, they aren’t always as informative as seeing every datapoint mapped out so you can get a clearer picture of the data.
Pingback: Reshape Package in R: Long Data format, to Wide, back to Long again | Psychwire