My Goodness. What a Fat Dataset!

Recently at work we got sent a data file containing information on donations to a specific charitable organization, ranging all the way back to the 80’s.  Usually, when we receive a dataset with a donation history in it, each row represents a specific gift from a specific person at a specific time.  Also, each column represents some kind of information about that gift.  The result is usually a dataset which is fairly long (thousands or hundreds of thousands, in my recent experience) with maybe about 15 columns or more.

In this case, each row represented one person, but there were 1,551 columns!!  As it turned out, after the first column, which was the ID of the person donating the money, there were supposed to be just 31 extra columns to describe the gift in each row.  However, the person who put the data together decided that we should get 31*50 columns so that each row represented a person, and not a gift, and every subsequent gift from that person was represented by an extra 31 columns to the right of the previous 31.  Ridiculous!!

Anyway, I knew that I could reshape this using R, by stacking all 50 copies of the same variable together, and making sure that each new resultant 31 vectors should just take the names of the first 31 vectors.  Following is a gist that shows what eventually worked for me:

In conclusion, if you need your dataset to get in shape, you need only remember one letter: R!

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s