Using R from Inside Statistica

I’ve been spending a lot of time in the last month or so doing projects at work not statistics related, hence the lack of posts!  In the interim, I had to do some serious research on handling datasets bigger than the last one I worked with (the one that kept threatening to max out my 8 gigs of RAM!).  I kept trying to practice working with R packages like bigmemory and ffdf, but nothing was completely satisfying my need to be able to handle a big dataset with different data types in different columns.  So, after reading up on different commercial stats packages, I determined that getting Statistica would be best for my supervisor and I (she’s insanely busy and wouldn’t have the time for the learning curve to learn Revolution R, if we were to buy that).

In speaking with my supervisor about Statistica, she mentioned that it can interface with R.  So once we got our copies of Version 11 Advanced, I went ahead and learned how the interface works.

Setup/Installation: The setup and installation of the R integration was really annoying.  There is a COM server application you have to download and install.  You have to make sure you run the installation in administrator mode.  Then you have to make sure that R is installed using administrator mode.  You have to make sure you get the rscproxy package in R and that it is installed in the R Home directory that sits in your program files folder.  It was quite a hassle.  Statistica put a white paper on their website explaining the process.

Memory Usage:  When you actively use the R integration in Statistica, take a look at your memory usage (I’m using a windows 7 computer for work).  What you will notice is that any time you run an R function in statistica, the R connector program starts taking up more and more memory, representing the fact that data is being passed from Statistica to R to be processed.  The upshot of this is that you should probably be careful how much data you’re passing to an R procedure from Statistica so that you don’t max out your memory.

Syntax: Check out the screenshot below.  Typing in R syntax into Statistica is, thankfully, pretty easy.  As you can see in the screenshot, if you want to access the active dataset to do something with it, you treat it as a dataframe labelled ActiveDataSet, and then you can use the $ sign and type the variable name of your statistica dataset like you would with R.  The only catch seems to be variables with spaces in them.  So for those variables it seems that you have to resort to referring to them by their column numbers, instead of name.

Functionality: So far, it looks like data only flows from the Statistica spreadsheet, to R, back to the Statistica report output, or a new Statistica spreadsheet.  It would be nice if I could modify data from R within a spreadsheet, but that seems to be out of the question.

Main advantage: Being a commercial product, the good folks at Statsoft aren’t just going to give you the product with all of the statistical procedures they came up with for free.  For example, since I now have Statistica Advanced, it does allow me to do some cool multivariate procedures, but I can’t generate random forests unless I get Statistica Data Miner.  The advantage that the R integration brings then, is allowing me to have advanced statistical procedures, like Random Forests, or even graphing abilities like ggplot2, without having to pay extra.  I show an example of having used a random forest procedure in Statistica using R in the screenshot above.

About these ads

7 thoughts on “Using R from Inside Statistica

    • Hey Matthew,

      When submitting data from a Statistica spreadsheet to an analysis in R, you’re supposed to use a built-in container called “ActiveDataSet”. When I refer to the container in my R code that I’ve typed into a Statistica macro, I can see my RAM usage increasing quite quickly, indicating that Statistica is passing the spreadsheet data off to R, instead of processing it outside of RAM. Unfortunately, I would still have to refer to the container in the R code in the Statistica Macro, so I’m not sure that even the data.table package would prevent out of memory problems. This is the price I pay for venturing into the closed source world :P

      • Hi. My comment was based on the first paragraph only of the article; i.e., you seem to be using Statistica to solve the memory problem. Could you use data.table in R, and stay all-in-R, instead of using Statistica at all? Since you haven’t posted any code, who knows, perhaps the memory problem with your code running in R can be solved with standard techniques; e.g., are you calling rbind within a for loop?

      • My apologies for the misinterpretation :) I’ve definitely noticed that for loops in R aren’t very resource friendly, and so I stay away from them whenever possible. I think that the real memory hogs for me are passing my data to a GLM procedure, subsetting it, and using the sqldf package to bring columns from one data frame into another one.

        We got Statistica at my workplace here because I was told that projects EVEN BIGGER than the one I complained about in my blog were on the horizon.

        I’ve read up about the data.table package, and I see that it speeds up operations quite a lot, but I do wonder whether it makes passing data to GLMs or other procedures any less memory hoggish…

      • Furthermore, I’d love it if there were an R package that easily let me manipulate and analyze large datasets outside of RAM, and also allowed that data to be of multiple types. I think ffdf was the most promising for me, but it was just too much of a hassle to get my work to persist (i.e. having a reference to it from my workspace) from one session to the next. Maybe I didn’t understand it properly, but there comes a point where you have to choose a less time consuming option!

  1. Pingback: Processing Data from a Statistica Worksheet Using R | Data and Analysis with R, at Work

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s