Package sqldf eases the multivariable sorting pain

This will be a quick one.  I was trying to sort my dataframe so that it went in ascending order on one variable and descending order on another variable.  This was really REALLY bothersome to try to figure out with base R functions.  Then I remembered sqldf!

# Assuming dataframe named 'mydf' and 'V1' and 'V2' are your variables you want to sort on in opposite directions
library(sqldf)
mydf.sorted = sqldf("SELECT * FROM mydf ORDER BY V1 ASC, V2 DESC")

That's all! No pain, and lots of gain. Show me how to do this so simply with base R and I'll be impressed.

EDIT:

After many comments on this post, it's become painfully obvious to me that I wasn't using the right search terms (or looking at that negative sign on the second argument to the order function) when searching for a base R solution to this problem. Well, we all make mistakes, right? I hope this post has still been helpful to others, and it isn't a completely redundant internet resource on this topic!

About these ads

18 thoughts on “Package sqldf eases the multivariable sorting pain

  1. If V1 and V2 are numerical variables, you could multiply one (let’s say V2) with -1 and then sort both of them ascending. If you multiply after the sort V2 by -1, you will have sorted by V1 ascending and V2 descending. This is untested, and I have no idea how you could do the same with character vectors (possibly convert to factors, you might be able to do similar?).
    But using sqldf is much easier to read.

      • Yes. These two should give the same results.

        mydf <- read.table(header = TRUE, text = "
        V1 V2
        a -1
        z 1
        a 3
        d -3
        f 1
        c 7
        f 3
        d 5
        ")

        mydf

        library(sqldf)
        sqldf("SELECT * FROM mydf ORDER BY V1 ASC, V2 DESC")

        library(doBy)
        orderBy(~ +V1 -V2, data = mydf)

  2. sqldf is great and very fast! I use it especially for aggregations using GROUP BY. Even data frames / tables consisting of more than a hundred thousand rows are often done within a second.

  3. Standard R way is to use combination of order and xtfrm. Something like: iris[with(iris, order(-xtfrm(Species),Sepal.Length)),] This works even for character columns. See ?order

  4. Pingback: Package sqldf eases the multivariable sorting p...

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s