Sampling and the Analysis of Big Data

After my last post, I came across a few articles supporting the opinion that if you have a good reason to take random samples from a “big” dataset, you’re not committing some kind of sin:

Big Data Blasphemy: Why Sample?

To Sample or Not to Sample… Does it Even Matter?

The moral of the story is that you can sample from “big data” so long as the analysis you’re doing doesn’t require some part of the data that will be excluded as part of the sampling process (an exampl being the top or bottom so many records based on some criterion).

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s