Migrating from SPSS/Excel to R

Tags

, ,

In this post, I give an outline for those interested in migrating from using SPSS and Excel for data processing/analysis across to using R for data processing/analysis. This will be the first post in a small series: it’s aimed at psychology researchers – as that’s what I am, but I’m sure much of this will apply to people from other fields/disciplines. For the purposes of this, I’ll assume that you do your data manipulation (e.g., pivot tables and organising datasets) using Excel, and your stats using SPSS. I also assume you use either SPSS or Excel, or perhaps an alternative package such as SigmaPlot, to make your graphs for publications.

Continue reading

Psychology and Airport Security at the Royal Society

Tags

, , , , , ,

This week, a contingent of plucky individuals from my lab have been presenting at the Royal Society’s Summer Science Exhibition. Sadly I couldn’t make it as I was on the one holiday I take each year! Now that I’m back, I thought it would be worth discussing their exhibit, and encouraging anyone who hasn’t been yet to go!

The exhibit covers details of the work we’ve been doing for years on airport security screening (first publication was back in 2004). You can see an introduction into the research in the video below. Apparently they are working on upping the sound a bit.

Our experiments are still alive and kicking, and we’ll be doing more work on it for (at least) the next 4-5 years – so watch this space! Aside from the practical benefits that this type of research has on offer, it’s turned out to be a very significant and useful source of inspiration for developing current models and theories of how humans search their environments for targets of various types. I wrote some more detailed stuff on my website here a while back. 

More information is available at the Royal Society’s website: click here. By the way, the X-ray picture of a bag that they have used has nothing naughty in it. You can also play some online games developed for the exhibition here and here.

Publications that are relevant to this can be found listed here and here.

Finally, I’d like to dedicate this post to the computer used to do the eye tracking in that video above. It died on us a few days after the video was recorded. RIP.

New Site Layout attempts to Take Advantage of Your Brain

I’ve changed the layout of the site a bit – including a new image of me looking like I know what I’m doing on the left.

Looking at it more closely, I realise I’ve unintentionally done something quite sneaky. Given that humans naturally follow the gaze of other humans, when you visit the site and see me looking over towards the content of the page, you should follow my gaze and look over here as well. Some people suggested that people with autism fail to do this, but that’s not correct (sly link to some of my colleague’s research on just that topic).

Imagine if I used that picture for nefarious purposes, you could be forced to look at something evil.

Summer has Landed

So the summer has finally arrived – of sorts! It’s an eventful time at the moment, and more is going on than I’d ever have time to either remember or write down.

This week, I’ve been up to visit my other half’s family up in the Midlands (technically, Coalville). She and her sister share a birthday and every year there is a massive family barbecue that we all go to. Massive is definitely the word because she has three brothers and three sisters, and many of them have had kids! It’s always great fun – though I didn’t take any photos of the barbecue, I did manage to get some snaps of the fields that we walked through the last morning I was there.

When I took the photos below, it was fantastically sunny, and was nice to see – particularly because my home city of Southampton is on week five of a six-week bin collection strike, so there is rotting rubbish, flies… the lot – all over the place. It was great to see some fields and breathe fresh air! Safe to say I didn’t enjoy returning home…

Importing and displaying a Data frame with C# and R.NET

Tags

, , , , ,

In this post, I give a brief and basic example of how to import a file into R and then display it using a C#-based windows GUI.

Background

I’ve previously put up an introduction to getting C# and windows-based GUI programs to interact with R using R.NET. My previous example was very, very basic (just a calculator talking to R), so here’s something that’s a little bit more interesting.

The end result looks something like this, with a dataset I used in previous examples imported from a CSV file:

The ‘Import CSV file’ button calls up the function that does the work.

Connecting C# and R.NET to R

To get the basics of this working, see my previous post.

Importing a CSV file into R and Displaying it

The code is pretty simple:

private void button1_Click(object sender, EventArgs e)
        {
            REngine engine = REngine.GetInstanceFromID("RDotNet");

            try
            {
                // import csv file
                engine.EagerEvaluate("dataset<-read.table(file.choose(), header=TRUE, sep = ',')");

                // retrieve the data frame
                DataFrame dataset = engine.EagerEvaluate("dataset").AsDataFrame();

                for (int i = 0; i < dataset.ColumnCount; ++i)
                {
                    dataGridView1.ColumnCount++;
                    dataGridView1.Columns[i].Name = dataset.ColumnNames[i];
                }

                for (int i = 0; i < dataset.RowCount; ++i)
                {
                    dataGridView1.RowCount++;
                    dataGridView1.Rows[i].HeaderCell.Value = dataset.RowNames[i];

                    for (int k = 0; k < dataset.ColumnCount; ++k)
                    {
                        dataGridView1[k, i].Value = dataset[i,k];

                    }

                }

            }

            catch
            {
                MessageBox.Show(@"Equation error.");
            }
        }</pre>

Step by Step

I’ll now go through the code step by step, providing snippets and then commenting on them.

REngine engine = REngine.GetInstanceFromID("RDotNet");

So, we begin by connecting to the REngine instance.

engine.EagerEvaluate("dataset<-read.table(file.choose(), header=TRUE, sep = ',')");

We then send a simple command to R as we would if we wanted to import a CSV file, using the read.table command. The useful thing here is that I’ve used file.choose() which is helpful because it creates the popup dialog window you’d expect from R to help you choose the file to import. Note that I’ve set it here to import a CSV, but of course you could change this to any of the other many formats your data may be in.

DataFrame dataset = engine.EagerEvaluate("dataset").AsDataFrame();

Once we have chosen our file, we need to get the dataframe back from R  so we can then display it in a DataGridView box (here called dataGridView1). To work out how to do that, we just ask oursevles the question: how would we do it using R? We’d just type in the dataframe’s name – which, in this case, is dataset. To send this command to R from R.NET, you just need to use engine.eagerevaluate. This sends the command “dataset” to R running in the background as though we were typing it into the console ourselves.

Now, we next need to tell R.NET what to do with the command we’ve just sent it. We’ve asked for a dataframe, so we want to take the result we get from our command and convert it to a dataframe for R.NET to interact with. To do that, we just append AsDataFrame() at the end of the eagerevaluate command. We now have one dataframe in R called dataset, and one in our C# program also called dataset!

for (int i = 0; i < dataset.ColumnCount; ++i)
                {
                    dataGridView1.ColumnCount++;
                    dataGridView1.Columns[i].Name = dataset.ColumnNames[i];
                }

With our dataframe imported through from R, we can begin by building up our DataGridView box. It’s just a simple case of iterating through the dataframe we’ve created and filling the DataGridView box. R.NET has all the useful ways to interact with a dataframe that you would expect, including getting access to the number of rows and columns, as well as the names of those rows and columns. I begin by setting up the column names using the code above.

for (int i = 0; i < dataset.RowCount; ++i)
                {
                    dataGridView1.RowCount++;
                    dataGridView1.Rows[i].HeaderCell.Value = dataset.RowNames[i];
...

Next, the rows and their names get added. I think it’s all pretty straightforward so won’t go into detail here.

 for (int k = 0; k < dataset.ColumnCount; ++k)
                    {
                        dataGridView1[k, i].Value = dataset[i,k];

                    }

Finally, we fill in the actual values within each cell.

Summary

So there we go. I’m sure the code could be cleaner, but it works as a basic example! I’m still new to C#, and any comments would be greatly appreciated as always!

The next steps will be to begin doing something meaningful with the data that have been imported.

Making GUIs using C# and R with the help of R.NET

Tags

, , ,

In this post, I’ll take a look at using R.NET with the creation of an application written in C#. As this is a first, basic step, I’ll provide the code to make a calculator. Not exciting I know, but it demonstrates the basics!

I provide full code for the C# solution at the end: through the post here, I’ll highlight the bits that I assume readers will be most interested in – specifically, bits that actually get C# and R to talk to one another.

Anyway, here is what the calculator ended up looking like:

It may be simple, but it does have at least one answer!

Background

I’ve recently been tinkering with making GUI interface-type-things for R. I previously posted a guide on working with RApache – which is a webserver that can run R scripts. This time around, I’m using C# and .NET.

I began my attempts with making a GUI that can run R in some shape or form by taking a look at the various options on offer. Aside from RApache on the web front, you have Rook as well. Then there is this, but it doesn’t have an open source license (and appears to not work with the latest version of R), so I haven’t tried it. There is also Rcpp and RInside, which I believe can be used for this sort of purpose as well, but I haven’t got around to trying them out yet.

Getting Started

You’ll need to download the necessary file from the R.NET page.

You’ll also need to add a reference to the R.NET assembly into your solution.

Connecting to R

Here’s the basic code you’ll need:

using RDotNet;
...
REngine.SetDllDirectory(@"C:\Program Files\R\R-2.13.0\bin\i386");
REngine.CreateInstance("RDotNet");

Note that here it’s assumed that you’ll have R version 2.13.0 installed in the default location, just like me. The file being searched for here is R.dll.

With that setup, to get connected to R in any of your methods, all you need is this:

REngine engine = REngine.GetInstanceFromID("RDotNet");

Simple! engine can now send and receive information, just like an R console would normally. It also persists through commands. For example, if you send the command x<- 1 and then ask for x , it will remember that it’s 1. Great!

Details of the Calculator

It’s a very basic calculator. All I did was bind each button press to add the text from that button (e.g., “1”, “2”, “3” and so on) to the text box at the top (called textBox_input). When the user hits the = sign, the text in textBox_input gets sent to R for evaluation. Here’s the function that adds button presses to the textBox_input box:

private void add_input(string input)
        {
            if (textBox_output.Text!="") {
                textBox_input.Text = "";
                textBox_output.Text = "";
            }

            textBox_input.Text += input;
        }

The if statement at the start of add_input checks to see if a calculation has already been run (i.e., the output textBox has text in it). If that is the case, then it wipes the proverbial slate clean, allowing us to put in something new to evaluate.

Interacting with R from R.NET

There are a number of ways to interact with R from R.NET. R.NET gives you access to various data types from R (e.g., numeric vectors, data frames, lists, and so on). For the purposes of the little calculator being demonstrated here, it’s a case of evaluating the input text of the calculator as follows:

string input = textBox_input.Text;
NumericVector x = engine.EagerEvaluate(input).AsNumeric();

EagerEvaluate sends input (the text in the input textbox) to our R instance for evaluation. Here, the information we get back from R is converted to a numeric vector.

Next, we turn to what we know from R. If we ask R to evaluate the following:

40 + 2

It returns us a nice and simple 42.

All that’s being done with the NumericVector x is that the output of that 42 is being added to one of R.NET’s NumericVector data types. If you want R.NET to grab a whole load of numbers in this fashion, it’s very easy.

Finally, it’s a case of setting up the output from the calculator to be equal to the first value in x. This is because we’re expecting x to be only one value anyway.

textBox_output.Text += x[0];

This then gives the output of 42 and adds it to the output textbox.

Next Steps

Here I’ve only shown a very simple example of how to use R.NET. I don’t think the world needs any new calculators, but in the future I’ll have a go at some more complex and interesting ways to get C# and R being friendly with one another.

Full C# Code

Here is all of my C# code. Note that I’ve included some bits and pieces to handle errors and exception, as well as different versions of R. Specifically, there’s a while loop in there to help the user hunt down the R.dll file if they don’t have the latest version of R, or it’s installed in something other than the default directory.

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using RDotNet;

namespace RNet_Calculator
{
    public partial class Form1 : Form
    {

        // set up basics and create RDotNet instance
        // if anticipated install of R is not found, ask the user to find it.

        public Form1()
        {
            InitializeComponent();

            string dlldir = @"C:\Program Files\R\R-2.13.0\bin\i386";
            bool r_located = false;

            while (r_located == false)
            {
                try
                {
                    REngine.SetDllDirectory(dlldir);
                    REngine.CreateInstance("RDotNet");
                    r_located = true;
                }

                catch
                {
                    MessageBox.Show(@"Unable to find R installation's \bin\i386 folder.
                    Press OK to attempt to locate it.");

                    if (folderBrowserDialog1.ShowDialog() == DialogResult.OK)
                    {
                        dlldir = @folderBrowserDialog1.SelectedPath;
                    }

                }
            }

        }

        // This adds the input into the text box, and resets if necessary

        private void add_input(string input)
        {
            if (textBox_output.Text!="") {
                textBox_input.Text = "";
                textBox_output.Text = "";
            }

            textBox_input.Text += input;
        }

        // the equals button, which evaluates the text

        private void button_equals_Click(object sender, EventArgs e)
        {
            textBox_output.Text = "";

            REngine engine = REngine.GetInstanceFromID("RDotNet");

            String input = textBox_input.Text;

            try
            {
                NumericVector x = engine.EagerEvaluate(input).AsNumeric();
                textBox_output.Text += x[0];
            }

            catch
            {
                textBox_output.Text = "Equation Error";
            }

        }

        // Begin the button function calls - long list and not exciting

        private void button_1_Click(object sender, EventArgs e)
        {
            add_input(button_1.Text);
        }

        private void button_2_Click(object sender, EventArgs e)
        {
            add_input(button_2.Text);
        }

        private void button_3_Click(object sender, EventArgs e)
        {
            add_input(button_3.Text);
        }

        private void button_4_Click(object sender, EventArgs e)
        {
            add_input(button_4.Text);
        }

        private void button_5_Click(object sender, EventArgs e)
        {
            add_input(button_5.Text);
        }

        private void button_6_Click(object sender, EventArgs e)
        {
            add_input(button_6.Text);
        }

        private void button_7_Click(object sender, EventArgs e)
        {
            add_input(button_7.Text);
        }

        private void button_8_Click(object sender, EventArgs e)
        {
            add_input(button_8.Text);
        }

        private void button_9_Click(object sender, EventArgs e)
        {
            add_input(button_9.Text);
        }

        private void button_point_Click(object sender, EventArgs e)
        {
            add_input(button_point.Text);
        }

        private void button_0_Click(object sender, EventArgs e)
        {
            add_input(button_0.Text);
        }

        private void button_plus_Click(object sender, EventArgs e)
        {
            add_input(button_plus.Text);
        }

        private void button_minus_Click(object sender, EventArgs e)
        {
            add_input(button_minus.Text);
        }

        private void button_multiply_Click(object sender, EventArgs e)
        {
            add_input(button_multiply.Text);
        }

        private void button_divide_Click(object sender, EventArgs e)
        {
            add_input(button_divide.Text);
        }

        private void button_left_bracket_Click(object sender, EventArgs e)
        {
            add_input(button_left_bracket.Text);
        }

        private void button_right_bracket_Click(object sender, EventArgs e)
        {
            add_input(button_right_bracket.Text);
        }

        private void button_ce_Click(object sender, EventArgs e)
        {
            textBox_input.Text = "";
            textBox_output.Text = "";
        }

    }
}

Watching people watching people – Part 2

Tags

,

So I’ve recently become interested in photographing people who are photographing other people. It’s quite fun, and I find it pretty fascinating in some respects. I’ve previously put up a post with some photos from sunrise at Stone Henge. There’s a better explanation of my motivations for doing this on there, too.

This time around, there’s a chain of photographs. They’re not the best, but they’re a start. Some people were getting their picture taken. Sean took a photo of the person taking their photo; I took a photo of him photographing their photographer. Get it? Good.

Without further gibberance, I give you the pictures:

This slideshow requires JavaScript.

I’m going to try and do more of these soon. Anyone willing to lend their body and a camera is welcome.

&&& links and further items

etc etc: Plato’s Allegory of the Cave, I guess.

videographic etc.: Rear Window and Bart of Darkness.

Testing Different Methods for Merging a set of Files into a Dataframe

Tags

, ,

I previously posted a method I used for merging a set of files into a dataframe. It wasn’t long before I had some very helpful comments from the R-bloggers community suggesting better methods to achieve my goal. In this post, I compare the different methods and see which is the most efficient (i.e., fastest).

The Methods

My original method is outlined in my post. In the comments, you can see two further methods suggested. One by sayan involves the use of the do.call function and lapply. A second by dan involves the use of plyr‘s ldply function. Check out the comments for the full discussion.

I will therefore compare three methods:

  • My original method
  • sayan‘s lapply method
  • dan’s plyr method

Testing

I ran each of the three methods 10 times (not hugely powerful I know, but it still took a while). For testing purposes, I merged two 16MB text files together, containing several thousand rows and several hundred columns. Having not done any real amount of timing in R before, I searched around a bit. In the end, I found two posts which I based my timings on (here and here). If I’ve done this incorrectly, let me know and I will run them again. Anyway, here are the results. The error bars are standard errors. The time taken is in seconds.
As you can see, my method is by far the slowest. Looks like I won’t be using it ever again!

The R Code

For maximum transparency, below is the R code I used to get these numbers.
# lapply method
lap = replicate(N, system.time(
  full_data<- do.call(
  "rbind",lapply(file_list, 
  FUN=function(files){read.table(files,
  header=TRUE, sep="\t")})))[3])
lap

# plyr method
ply =
replicate(N, system.time(
  dataset <- ldply(file_list, read.table, header=TRUE, sep="\t")
)[3])
ply

# original method
orig= 
replicate(N, system.time(
for (file in file_list){
  # if the merged dataset doesn't exist, create it
  if (!exists("dataset")){
    dataset <- read.table(file, header=TRUE, sep="\t")
  }
  # if the merged dataset does exist, append to it
  if (exists("dataset")){
    temp_dataset <-read.table(file, header=TRUE, sep="\t")
    dataset<-rbind(dataset, temp_dataset)
    rm(temp_dataset)
  }
}
)[3])
orig

Merge all files in a directory using R into a single dataframe

Tags

, , , ,

In this post, I provide a simple script for merging a set of files in a directory into a single, large dataset. I recently needed to do this, and it’s very straightforward.

Set the Directory

Begin by setting the current working directory to the one containing all the files that need to be merged:

setwd("target_dir/")

Getting a List of Files in a Directory

Next, it’s just a case of getting a list of the files in the directory. For this, the list.files() function can be used. As I haven’t specified any target directory to list.files(), it just lists the files in the current working directory.

file_list <- list.files()

If you want it to list the files in a different directory, just specify the path to list.files. For example, if you want the files in the folder C:/foo/, you could use the following code:

file_list <- list.files("C:/foo/")

Merging the Files into a Single Dataframe

The final step is to iterate through the list of files in the current working directory and put them together to form a dataframe.

When the script encounters the first file in the file_list, it creates the main dataframe to merge everything into (called dataset here). This is done using the !exists conditional:

  • If dataset already exists, then a temporary dataframe called temp_dataset is created and added to dataset. The temporary dataframe is removed when we’re done with it using the rm(temp_dataset) command.
  • If dataset doesn’t exist (!exists is true), then we create it.

Here’s the remainder of the code:

for (file in file_list){
      
  # if the merged dataset doesn't exist, create it
  if (!exists("dataset")){
    dataset <- read.table(file, header=TRUE, sep="\t")
  }
  
  # if the merged dataset does exist, append to it
  if (exists("dataset")){
    temp_dataset <-read.table(file, header=TRUE, sep="\t")
    dataset<-rbind(dataset, temp_dataset)
    rm(temp_dataset)
  }

}

The Full Code

Here’s the code in it’s entirety, put together for ease of pasting. I assume there are more efficient ways to do this, but it hasn’t taken long to merge 45 text files totalling about 400MB with some 300,000 rows and 300 columns.

setwd("target_dir/")

file_list <- list.files()

for (file in file_list){
      
  # if the merged dataset doesn't exist, create it
  if (!exists("dataset")){
    dataset <- read.table(file, header=TRUE, sep="\t")
  }
  
  # if the merged dataset does exist, append to it
  if (exists("dataset")){
    temp_dataset <-read.table(file, header=TRUE, sep="\t")
    dataset<-rbind(dataset, temp_dataset)
    rm(temp_dataset)
  }

}

Notes to Self: Getting Rapache working in Natty Narwhal (Ubuntu/Linux)

Tags

, , , , , , ,

In this post, I cover my experiences in getting Rapache working in the latest version of Ubuntu (11, Natty Narwhal). I was unable to find an example that managed to get the whole thing working, so here is what I did. I’m not going to pretend to be a power user of Apache or Linux here. Most of my experience comes from playing with Apache and breaking it horribly. Some people might say that’s the best way to learn. If anyone has comments on how to improve how I did it, please share, as I’m really keen to hear!

My method for getting it working was based on a mix of two other guides. First, the official help guide. Second, this post here. I had to try different bits from each of them.

For all of the code snippets below, I’ll assume you’re logged into the terminal as root. If you’re not, don’t forget to prefix stuff with sudo to allow you to run the commands as the root user.

If you’re an R user who is interested in getting into linux, there’s a great post here by Jeromy Anglim. Also, remember that Narwhals love donuts.

Installing R

This is the easy part. Add a CRAN source to your sources list:

[CRAN-SERVER]/R/bin/linux/ubuntu natty/

Naturally, replace [CRAN-SERVER] with your CRAN server of choice. For this, you’ll need to install both r-base and r-base-dev packages.

Installing Apache2

Next you need to install the server itself. This is straightforward:

apt-get install apache2

Installing Rapache and extra packages for Apache2

This one is easy again – it’s in the manual!

apt-get install r-base-dev apache2-mpm-prefork apache2-prefork-dev
wget http://biostat.mc.vanderbilt.edu/rapache/files/rapache-latest.tar.gz
rapachedir=`tar tzf rapache-latest.tar.gz | head -1`
tar xzvf rapache-latest.tar.gz
cd $rapachedir
./configure
make
make install

Configuring the Rapache Module

Now we have the fun bit. Begin by firing up a text editor of your choice:

gedit /etc/apache2/mods-available/r.conf

I added the following lines:
<Location /R>

ROutputErrors
SetHandler r-script
RHandler sys.source
</Location>

<Location /RApacheInfo>
SetHandler r-info
</Location>

But, what do they do? As I understand, the first Location tells Apache to treat any file in the R directory (which I created within the web root directory) as an R script and run it through R. The second Location directive, with /RApacheInfo, is again from the manual. If you use that, and head to 127.0.0.1/RApacheInfo you’ll get a testing page to show that your setup is working just fine.

Next we create the r.load file.

gedit /etc/apache2/mods-available/r.load

And then add the following:

LoadModule R_module /usr/lib/apache2/modules/mod_R.so

Great! One last step. Just tell apache to load the module for r:

a2enmod r

And that’s it!

Now restart your apache server:

/etc/init.d/apache2 reload

Testing Time

Assuming there are no errors, you can now test your server. If you head to 127.0.0.1/RApacheInfo, you should hopefully see a test page, which means you can do what I did and punch the air with success (embarrasing I know).

First Rapache Script

Let’s just do something dead simple for now:

x = rnorm(100, mean=5, sd=3)
print(x)

Then, if you save it in the R directory, you should see the output you’d expect if you had typed these commands into the terminal.

Next Steps

The next steps are the bread and butter of making interesting web apps – gets and posts, passing data to scripts, and doing interesting stuff.

I currently don’t seem to be able to get the server to load newly installed libraries – not sure why as they are installed. The mission continues…!