Mining for relations between nominal variables

The task today was to find what variables had significant relations with an important grouping variable in the big dataset I’ve been working with lately.  The grouping variable has 3 levels, and represents different behaviours of interest.  At first I tried putting the grouping variable as a dependent variable in a multinomial logistic regression, but I didn’t really trust the output, and the goal was really just to construct a bunch of graphs showing significant bivariate nominal relations in the data..

That’s when I turned to my good old friend, the chi squared test.  All I had to do was select all the variables that I wanted to test against the grouping variable, and construct a list of the chi squared statistic from each test, the variable being tested, and the crosstab of the two variables for later graphing.  So that’s exactly what I did:

One really sweet thing about matrices in R is that you can mix them up with some parts having just numbers, some parts having text, and sub-matrices in other parts!  A typical row of the “resultlist” would look something like this:

xsq    testvar            xtab
[1,]     200.7 “variable1″ numeric,6

Then all I needed to do to see the variable name and crosstab for that variable was to call “resultlist[1,2:3]“, and that gave me the numbers to graph.

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s