Length by a Grouping Variable with NA Values Omitted in R

Today, after my supervisor pointed out to me the discrepancy between some graphs of percentages from a data frame that I was working with, and the raw numbers, in a table, from which those percentages were taken, I realized that I was including some NAs in my length calculations.

The data were simple binary columns, with 0 being the absence of an attribute, 1 being the presence, and NA being an incalculable value.  The dependent variable here was whether or not people donated at a certain level, and the independent variable was a simple binary grouping variable.  First I got the sum of the dependent variable by the independent variable (i.e. how many people had donated at that level depending on the independent variable):

tapply(y, x, sum, na.rm=TRUE)

That worked simply enough.  Then I wanted to extract the total number of people who donated, regardless of whether they had reached the specified level:

tapply(y, x, length)

That gave me numbers, but they included the NAs.  I know that to simply get the number of non-NA values from a vector in R, all you have to type is sum(!is.na(x)) and there you go, but I needed this by a grouping vector.  So I realized what I needed to do this evening and made a laughably small function:

Even though the meat of this function is very small, it’s still nice to simplify 🙂  Live and learn I guess!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s