Today, after my supervisor pointed out to me the discrepancy between some graphs of percentages from a data frame that I was working with, and the raw numbers, in a table, from which those percentages were taken, I realized that I was including some NAs in my length calculations.
The data were simple binary columns, with 0 being the absence of an attribute, 1 being the presence, and NA being an incalculable value. The dependent variable here was whether or not people donated at a certain level, and the independent variable was a simple binary grouping variable. First I got the sum of the dependent variable by the independent variable (i.e. how many people had donated at that level depending on the independent variable):
tapply(y, x, sum, na.rm=TRUE)
That worked simply enough. Then I wanted to extract the total number of people who donated, regardless of whether they had reached the specified level:
tapply(y, x, length)
That gave me numbers, but they included the NAs. I know that to simply get the number of non-NA values from a vector in R, all you have to type is sum(!is.na(x)) and there you go, but I needed this by a grouping vector. So I realized what I needed to do this evening and made a laughably small function:
Even though the meat of this function is very small, it’s still nice to simplify 🙂 Live and learn I guess!