“Big Data Analytics”? Hrm….

I do data mining and modeling really often these days.  However, the datasets that I work with really wouldn’t be considered “Big Data” (around 25,000 to about 200,000 rows, with quite a lot of variables).  I don’t know if I’ll ever be in a position to work with “Big Data”, but all the hype around it gets me thinking from time to time.  

Question: If I’ve got millions upon millions of records to work with, do I really need to submit all of them to my data analysis software (R) for data mining and modeling?

Answer: Not in the least bit.  If all I’m doing is looking for trends and building models that predict some desired behaviour, all I would have to do is get a handful of random samples that are small enough to fit into my data analysis software.  Then I could do my data mining on any one of the samples, build a model or models, and then test it/them on the other samples.  Apparently, random sampling is possible in Hadoop.  What this means is that if I get these random samples from the DBMS, I can just use the same kinds of techniques I’ve been using all along.

Am I missing something, or is “Big Data Analytics” more of a marketing term than an actual reality?

Advertisements

One thought on ““Big Data Analytics”? Hrm….

  1. We hear this “can’t I just sample” question a lot. The short answer is: no, at least not always. Our longer (but not very long) answer is at http://www.cybaea.net/Blogs/Journal/When-Big-Data-Matters.html. The first paragraph:”Big can be a qualitative as well as a quantitative difference. The gas in the ill-fated Hindenburg airship, the gas that formed our Sun, and the gas that formed the Milky Way galaxy were just lumps of hydrogen atoms (with varying impurities). The difference was in the number of atoms. But that difference in numbers made the three structures into different things. You simply cannot look at them in the same way. If you try to model the galaxy in the way you model a balloon you will fail.”PDF: http://bit.ly/GEJYRx

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s