# Functions ddply and melt make plotting summary stats in R more tolerable

The main reason why I have usually chosen to use excel to make my plots at work is because I had difficulty feeding the summary stats in R into a plotting function.  One thing I learned this week is how to make summary stats into a data frame suitable for plotting, making the whole process of plotting in R more tolerable for me.  Below I show the process using the ever-popular iris dataset.  I use the functions ddply and melt to both summarize and restructure the data into a form amenable to plotting.

``` length.by.species = ddply(iris, "Species", function (x) quantile(x\$Sepal.Length, c(.25,.5,.75)))
> length.by.species
Species   25% 50% 75%
1     setosa 4.800 5.0 5.2
2 versicolor 5.600 5.9 6.3
3  virginica 6.225 6.5 6.9
length.by.species = melt(length.by.species, variable.name="Quantile",value.name="Sepal.Length")
length.by.species
Species Quantile Sepal.Length
1     setosa      25%        4.800
2 versicolor      25%        5.600
3  virginica      25%        6.225
4     setosa      50%        5.000
5 versicolor      50%        5.900
6  virginica      50%        6.500
7     setosa      75%        5.200
8 versicolor      75%        6.300
9  virginica      75%        6.900```

One thing you can see in my call to ddply is that the main qualitative variable, whose values are used to subset your data frame, is referred to using quotes.  Somehow I find that a bit weird (I’m used to referring to variables without quotes, I suppose!).  Other than that, the syntax for the ddply command is similar enough to the apply family of functions, so no more complaints here.  You can also see that once I call the function, it gives me a nice neat data frame where the quantiles I asked for are columns, and the values of the Species variable represent different rows (or subsets of the data frame).

The melt command is easy enough, simply wanting to know what to call the column that will represent the values in the column titles (Quantile!) and what to call the numeric measure that the values come from (Sepal.Length).

Now that the summary stats are in a “Long” form data frame, with one column representing the numbers, and two columns containing text, it’s just a simple one liner to create a graph (here done in ggplot).  Below I show one line to create a dodged bar graph, and another line to create a dot plot, both showing the 1st to 3rd quantiles of Sepal.Length by Species.

```ggplot(length.by.species, aes(y=Sepal.Length, x=Species, fill=Quantile, stat="identity")) + geom_bar(position="dodge")
ggplot(length.by.species, aes(x=Sepal.Length, y=Species, colour=Quantile, stat="identity")) + geom_point(size=4)```

Thank you ddply and melt!

## 2 thoughts on “Functions ddply and melt make plotting summary stats in R more tolerable”

1. Thanks for sharing! I am learning ddply and melt for the first time this semester, but many of the resources I have gone to for help have been way over my head. I like that this is simple enough for a fool like me to understand! But still very useful.🙂 By the way, I noticed your dislike for the variable having to be put in quotes in ddply–I think this might solve your problem, although it might create a new one (if you are opposed to punctuation marks in general!):

length.by.species = ddply(iris, .(Species), function (x) quantile(x\$Sepal.Length, c(.25,.5,.75)))

This is the standard way that we write the line of code for ddply–putting the variable in .( )

Not sure if that helps at all, but just thought I’d offer an alternative. Thanks again, and good luck with your R-scapades!

• Wow, it’s been a while since I typed this post!! I’ve since adopted the syntax that you mention in your comment, and also learned that you can split your summary stats by more than one variable by typing something like so .(Variable1, Variable2). Keep at it with R, Lindsey! I still use it every day at work!