Hey Fitbit, my data belong to me!

When you go on Fitbit’s website and want to download your own Fitbit data, you will find your way to their Data Export utility on their settings page. Once you get there, you are greeted with what looks like a slogan: “Your data belongs to you!”. Having seen the very high level data that they provide you with when you click download, I don’t believe them.

How Orwellian!

How Orwellian!

It’s nice that they let you download even this much data (I really like the sleep tracking feature), and I can at imagine myself being interested in my daily stats as I become more competitive with myself, but:

A) Where’s the heart rate data? I got a Fitbit Charge HR for my birthday, and I happen to like the heart-rate monitor function!
B) What about the intraday data? I am interested in doing further analysis of my heart rate based on time of day.
C) What about the specific activities that I log by pressing the button on the side of my Charge HR and annotate/entitle on the website?

For these reasons, I’m convinced that they don’t really believe that my data belong to me. That’s why it was nice to find out about Corey Nissen’s fitbitScraper package. At the current time, it allows you to get steps, distance, active minutes, floors, calories burned on a 15 minute interval, and heart rate on a 5 minute interval. It basically logs on to the Fitbit website and scrapes your data from the page source code!

Below, you will see the code I cooked up showing how I used his package to start a running dataset, spanning as many days as I want, on what my heart rate is doing throughout each day. If you want to adapt this code to download the other available categories of data, note that any references to heart rate will have to be changed accordingly 🙂

Initializing the dataframe

library(lubridate)
library(plyr)
library(dplyr)
library(fitbitScraper)

hr_data = list(time = c(), hrate = c())

cookie = login("my@email.com", "mypassword", rememberMe = TRUE)
startdate = as.Date('2015-08-07', format = "%Y-%m-%d")
enddate = today()
s = seq(startdate, enddate, by="days")

for (i in 1:length(s)) {
 df = get_intraday_data(cookie, "heart-rate", date=sprintf("%s",s[i]))
 names(df) = c("time","hrate")
 hr_data = rbind(hr_data, df)
 rm(df)}

This code might need some explanation (feel free to skip this text if you understand it just fine). First you’ll notice the login function where you’ll need to enter in the email and password with which you registered for your fitbit account. Then you’ll notice that I’ve specified a ‘startdate’ variable and have put ‘2015-08-07’ as the value within the as.Date function. This was just the first full day that I spent with the fitbit, and you can change it to whatever your first fitbit day was. Then, the seq function conveniently creates a vector containing a series of date stamps starting from the specified ‘startdate’ and ending with ‘enddate’ which you can see is whatever date today happens to be.

Finally, we have the for loop, which loops through indices representing each element in the date sequence ‘s’ so that each day’s worth of 5 minute data can be saved to a temporary data frame, and appended to the ‘hr_data’ object, which converts from being a list to being a data frame. After all that is said and done, you now have your first volley of your own fitbit data.

But wait, there’s more! What about when you want to download more data, several days from now? Do you have to run this same code again, overwriting data from days that have already been processed? You could do that, or you could use the more complex code below, which only scrapes the fitbit website for data from days that aren’t complete, or there at all!

Code for when you want more and more and more fitbit data!!!

library(lubridate)
library(plyr)
library(dplyr)
library(fitbitScraper)

cookie = login("my@email.com", "mypassword", rememberMe = TRUE)
startdate = as.Date('2015-08-07', format = "%Y-%m-%d")
enddate = today()
s = seq(startdate, enddate, by="days")

completeness = hr_data %>% group_by(dte = as.Date(time)) %>% summarise(comp = mean(hrate > 0))
incomp.days = which(completeness$comp < .9)
missing.days = which(s %in% completeness$dte == FALSE)
days.to.process = c(incomp.days, missing.days)

for (i in days.to.process) {
  df = get_intraday_data(cookie, "heart-rate", date=sprintf("%s",s[i]))
  names(df) = c("time","hrate")
 
  # If the newly downloaded data are for a day already in
  # the pre-existing dataframe, then the following applies:

  if (mean(df$time %in% hr_data$time) == 1) {

    # Get pct of nonzero hrate values in the pre-existing dataframe
    # where the timestamp is the same
    # as the current timestamp being processed in the for loop.

    pct.complete.of.local.day = mean(hr_data$hrate[yday(hr_data$time) == yday(s[i])] > 0)

    # Get pct of nonzero hrate values in the temporary dataframe
    # where the timestamp is the same
    # as the current timestamp being processed in the for loop.

    pct.complete.of.server.day = mean(df$hrate[yday(df$time) == yday(s[i])] > 0)

    # If the newly downloaded data are more complete for this day
    # than what is pre-existing, overwrite the heart rate data
    # for that day.

    if (pct.complete.of.local.day < pct.complete.of.server.day) {
      rows = which(hr_data$time %in% df$time)
      hr_data$hrate[rows] = df$hrate}
  }
  else {

    # If the newly downloaded data are for a day not already in
    # the pre-existing dataframe, then use rbind to just add them!

    hr_data = rbind(hr_data, df)
  }
  rm(df)
}

The first thing I’d like to draw your attention to in this code in the block beginning with the definition of the ‘completeness’ object and ending with the ‘days.to.process’ object. What I’m trying to accomplish with these objects is:

A) To get a list of which days in my pre-existing data frame might benefit from a complete data refresh due to too much missing data (you’ll notice I’ve defined an incomplete day as one with less than 90% of non-zero data), and
B) Which days worth of data I am missing because I just didn’t download data on that day.

The ‘days.to.process’ object is just a vector that puts together the date stamps of any days that are incomplete, and any days that are missing in my data frame.

In the for loop, I loop through the date stamps represented in the ‘days.to.process’ object, and proceed like I did before at first, but then a few new things happen:

I check to see if the date-time stamps in the temporary data frame are in my pre-existing data frame. If they are, I then do a comparison of the percent of data that is non-zero for that day in my pre-existing data frame (that’s the purpose of the ‘pct.complete.of.local.day’ variable) against the percent of data that is non-zero in the temporary data frame (hence the ‘pct.complete.of.server.day’ variable).

If there is more non-zero data in the temporary data frame for that day, I then find the row indices in the pre-existing data frame for the day in question, and then use them to update the data in place with the new data from the temporary data frame.  This particular comparison of non-zero data in the pre-existing vs the temporary data frame is probably redundant in most cases.

It would be nice to be able to easily/automatically get the physical activities that I have logged through the website (rowing machine, stationary bike, treadmill, etc.) so that I could correlate them with my heart rate at the time, but I guess I have to do that manually for now. Eventually, I’m interested in a somewhat deeper analysis of my heart rate at different times of the day.

At least now I feel more like my data belong to me, even though I had to resort to making use of someone else’s very smart coding (thank you, Corey!) to do it!

Advertisements

19 thoughts on “Hey Fitbit, my data belong to me!

  1. Pingback: Hey Fitbit, my data belong to me! | Mubashir Qasim

    • Also, I’ve found that just setting up an IFTTT trigger is sufficient to capture aggregated stats at the daily level into a google spreadsheet. I’m sure there are good uses for 15-minute aggregates, so I’m sure that isn’t a complete solution.

  2. All these efforts and comments are telling us that fitbit still thinks your data belongs to them. They will let you access it with special permission and in some cases extra money. I like the old polar model in which they provided a way for one to download the data from the device to your own computer without requiring you to download your data on a proprietary website that restricts your usage of your own data. I won’t buy a fitbit (or any similar device) until I have a way to keep my own data. That means either fitbit provides software for that or somebody comes up with a good hack. I like the devices but will probably have to wait for the hack because for some reason, fitbit and similar companies think possessing individual users’ data is worth a lot.

    • I agree that at the end of the day they feel a certain ownership over your data. They use it for marketing and apparently to inform the health community about trends. But I think it’s the former purpose that brings them the most value/cash!

  3. Pingback: Hey Fitbit, my data belong to me! « Manipulate Magazine: Math 4 You By Us Group Illinois

  4. Pingback: Hey Fitbit, my data belong to me! | HR Analytic...

  5. do you know if the data presented in the 5 min interval is the average value of the 5 mins leading up to the time presented? or is it the value captured at that exact time?

  6. Pingback: Do seconds or minutes really count? | kayleasvitalstats

  7. Might be worth a revisit to this project as the Charge HR and Surge are now automatically identifying activities to see if you can detect your specialised activities.

  8. Pingback: Hey Fitbit, my data belong to me! | Open Data Aha!

  9. Pingback: Downloading heart rate data for the fitbit charge hr | Bits and pieces about my Pi

  10. Me not being very good with IT stuff; what do I do with that “code”? I stick it into a text document, rename the extension to .bat and double click on it? Where does it save the csv file?

  11. Seems the code doesn’t work for any longer period of time than a week (what the author used for some reason). I constantly get error when try to scrap high-detailed data for longer period:
    Error in curl::curl_fetch_memory(url, handle = handle) :
    Timeout was reached

  12. Hi there, is there any manual available how to use fitbitscraper? I installed R on my Windows machine and installed the fitbitscraper package inside R. But I can not find out how to proceed…
    Any help is appreciated!

  13. Hi All
    I’m a squash player and I’m interested to see how my games vary with the recovery in my Heart Rate when I am playing a match. I recently acquired a Fitbit, so of course I wanted to get my hands on the raw data from the device. It turns out that it’s not all available through the standard mechanisms in the App – so, I’ve built my own interface with Fitbit’s help (through their API) and I’m now able to access lots of interesting data insights and download the data so that I can play with it to my heart’s content using Excel or similar.
    The download is aimed at fellow squash players who use my free League and Ladders website, so if you are a squashplayer and/or are interested in trying this out – you are welcome to do so. The only requirement is that you register as a user of the Squash Leagues website (and you are a registered Fitbit user!).

    Try it out here: http://www.squashleagues.org/fitbit/fitbitdatadownload

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s