Science Open Day

This weekend, a crack team of students and postgrads were involved in giving demos of the kind of experiments that we get up to, as well as the eye-tracker, at an open day run by the university. Here they all are in their glory:

From left to right: Sarah, Zoe, Leanne, Carl, Ascen, Charlotte and Karen

There was some also some silliness, totally out of character:

Here you can watch it in a never-ending loop:

Migrating from SPSS/Excel to R, Part 3: Preparing your Data

Tags

, ,

In this post, I describe how to prepare your data for migrating between SPSS/Excel and R. This is the third post in a series, the first two of which can be found here and here. Don’t forget, this is primarily aimed at those working on datasets for psychology experiments, as that’s what I do.

Datasets in SPSS/Excel

One of the golden rules of working with datasets in SPSS is that you need to have one row for each participant. I know there are some exceptions to this, but it’s an important general rule for SPSS.

The main consequence of this is that, when you’re dealing with any form of within-subjects data, your dataset quickly becomes very wide indeed. Let’s look at an example below. Here, we have 10 participants, involved in two experimental sessions. For each session, we’ve measured the Reaction Time (RT).

That’s not too messy (note that I just pasted in 1200 for the values as this is just an illustration). But let’s make things worse. Let’s have 10 experimental sessions, each with three different blocks of trials, each representing a different within-subjects condition. What does it look like now?

Well, we can’t fit it all into a single screenshot, as the dataset has a large number of columns. This is an illustration of what gets referred to as a wide data format – you have a large number of columns mapping on to various factors, variables, etc.

R does things differently, for most of the statistical tests that I’ll be discussing: it uses the long data format instead.

Long Datasets in R

When you think about it, wide datasets can be a real pain. I’ve seen people spend hours running pivot tables and then having to drag columns around to get their datasets in a format that SPSS will be happy with.

With R, things are significantly easier: for many tests, such as t-tests and ANOVAs of various forms, you only need to use a single layout: the long data format. You can probably guess what this is already, but let’s do a direct comparison using the first example dataset described above.

Again, let’s say we have Reaction Times (RTs) for 10 participants involved in two sessions of experimental trials. In wide format, these data look like this:

In the long format, these data look like this:

Here you can see the difference: in the long format, the one row per participant rule does not apply. Instead, you have one row for each combination of factors under examination.

What if your Datasets are all in the Wide Format?

There are a number of options that you can use to convert between the two different formats. I’ve covered perhaps one of the easiest methods, in the form of the reshape package, in a previous post. You’ll need to install the reshape2 package to do this, using the package installation guide I presented previously.

Just to give an example, let’s work through the dataset I’ve been describing above.

First, let’s create some data:


session1 <- rnorm(mean=1500, sd=250, 10)

session2 <- rnorm(mean=1000, sd=250, 10)

ppt <- seq(1:10)

wide<- data.frame(ppt, session1, session2)

That gives us a dataframe called wide. How do we reshape the dataset to the long format that we want? Simple, by using the following:


long<- melt(wide, id=c("ppt"))

This then gives us a dataframe called long, arranged in the format we want.

In many cases, if you want to avoid having to do this, it’s best to make sure your datasets are in the long format beforehand – it’s a simple case of planning ahead and knowing that you can do things differently.

Summary and Next Steps

This post illustrated how to get your data organised for use in R for those who are used to using SPSS/Excel. There are many useful ways to re-organise your data, and I’ve covered one of them here (the reshape package). The next steps include aggregating your data and then running statistical tests.

Using visual interruptions to explore the extent and time course of fixation planning in visual search

Tags

, , ,

Here is a permanent copy of my poster for the European Conference in Eye Movements Poster for the 2011 ECEM Meeting. The full reference is:

Godwin, H., Benson, V., & Drieghe, D. (2011). Using visual interruptions to explore the extent and time course of fixation planning in visual search. Poster presented at the European Conference in Eye Movements, Marseille, France.

The poster can be downloaded via the following link:

Click here :ecem_interruption_poster_final

Summer

Wow, what a summer it’s been so far, and it’s not even over yet. I’ve not had time to catch up on the posts I’d started a while back- and it looks like I’ll be away for a while longer.

Normally, summers involve a slight easing up of the workload, thanks to there being no students around, allowing people to catch up with things…but not this time! It’s been fun though, it’ll just be a little while before I’m back.

 

See you…out there…

Migrating from SPSS/Excel to R, Part 2: Working with Packages

Tags

, , ,

In this post, I cover an important aspect of using R that users of SPSS/Excel won’t be familiar with: working with packages. Packages and the package system form a major difference between R and SPSS/Excel, which is why I’m devoting this entire post to them. It’s the second post in a series aimed at people wanting to migrate from SPSS/Excel to using R full-time. The previous post on this topic is available here. Again, this post is aimed primarily at psychology researchers, as that’s what I am, though it will hopefully be relevant to others as well.

Packages in R

With SPSS/Excel, you pretty much get everything you could ever want to use, and more, installed with the default installation. This leads to a simple question. How many of the many hundreds of buttons, boxes and options in these programs have you used in total?

R is different. The basic installation of R comes with a large number of packages and commands. However, with R, people have been able to share their own packages which can help out, extend, and implement other useful things to make R even more funky and powerful. This is beneficial for a number of reasons, but, for the new user, it might seem a bit strange. Why doesn’t R just come with all the packages installed right away? Well, the chances are you won’t need all of the packages in existence, so there’s little point in installing them all by default. Doing so also reduces the size of an R download, saves hard drive space, and so on.

People are adding new and useful packages all the time, so let’s install a couple of popular ones that I use all the time.

Installing Packages in R

To get to the list of packages you have installed, go to the packages tab using RStudio:

Packages are often updated, so you can use the Check for Updates button to update your packages.

To install a new package, you can either run the following command via the script tab or console window:

install.packages("PACKAGENAME")

Where PACKAGENAME is the name of the package. Alternatively, using RStudio, you can hit the Install Packages button in the Packages window. You’ll be greeted with something like the following:

In this window, just type the name of the package you want to install in the Packages text box. Here, I’ve gone for ggplot2 and plyr.

Once you hit the install button, the packages will be installed. It’s best to leave Install Dependencies checked because some packages need others to function. For example, ggplot2 uses plyr.

Loading Packages in R

The packages you have installed won’t be loaded straight away. If R loaded all the packages you had installed, then you would often end up with packages loaded that you don’t need to use. To load your packages, you can do one of two things. First you can run the command:


library(PACKAGENAME)

Where PACKAGENAME is the name of the package that you want to load.

An alternative method is to select the package using RStudio’s Package window. To load the package(s) that you want, all you need to do is click the checkbox next to the package name. See below.

There we go, ggplot2 has now been loaded! It’s also loaded plyr as ggplot2 needs plyr to function, as well as reshape.

Getting Help with Using a Package

Packages come with helpful documentation to get you started with using them. Again, you have two options in terms of accessing the documentation. First, you can type the command:


?PACKAGENAME

Where PACKAGENAME is the name of your package.

Alternatively, you can click the name of the package in RStudio’s Packages window, as below.

Whichever method you use, you’ll be presented with the documentation in your packages window, which you can browse to work out what you need to do to use the package.

Which Packages should you Install?

One of the daunting aspects of getting started with R is choosing how to use it, and what packages to install. I’ll cover some suggested packages in future guides, but for the eager, there’s a great list of popular packages that has been put up online by Matthew Dowle, and is available at this link. The list is also part of his unknownR package, which is worth trying out if you are new. When learning R, I used that list to inspire me in terms of which packages I should learn.

You should also keep an eye on community sites such as R-Bloggers, as you’ll often read about packages, as well as other tips and tricks, that you can use and learn from.

UPDATE: Thanks to Tal Galili’s comment, readers may also want to check out CRAN task views, which has detailed info on a huge range of packages.

Next Steps

In the next guide, I’ll get into the interesting stuff: importing and manipulating data, and how doing so differs from SPSS/Excel.

Follow

Get every new post delivered to your Inbox.