Plotting Strava data with Python

Update 21/06/2020: The script described in this blog is now running at https://nickdoornekamp.pythonanywhere.com/, where you can connect your own Strava profile to make pictures like the one above.

In my free time I like to go cycling. I easily get discouraged when the weather’s bad, though (In Dutch, people like myself are often called a ‘mooiweerfietser’: a ‘good weather cyclist’). Luckily one of my other hobbies, writing Python scripts, is a lot more suitable for rainy days.

One of the scripts I wrote can plot GPS data I gathered by tracking my cycling activities using Strava, which results in plots like the one above. In this blog, a few interesting parts of one of those script are highlighted. The full code and instructions for running it yourself can be found in my GitHub repository.

Connecting to the Strava API

First order of business is obtaining the data from Strava. They have a well-documented API, which allows you to programmatically access the data they have on you. This requires authorisation using OAuth 2.0, which takes quite some steps to set up. However, it is described quite clearly here. In a nutshell: once you’ve set up an app and obtained an authorisation code, you can exchange it for an access token. Then, if you send a request to the Strava API, your request must include the access token to be authorised.

With our access token in hand, we can start making requests to the API. For example, to get the 10 most recent rides from your Strava profile, the request looks like this:

This request will return a list of 10 activities, with around 50 attributes each. This includes covered distance (in meters), the elapsed time (in seconds), the ‘moving time’ (which I assume to be the elapsed time minus time you weren’t moving) and more in a nice JSON format:

Plotting Strava’s polyline

The activity attribute that will be most important for the rest of the script, is the attribute named “summary_polyline”:

summary_polyline = "sq_}H{dd^uFiAoCaEc@]Q[Qy@GC}DpCaFfCaIfMuEzIuAfBkH`Lk@l@gCjFiIpMeCtByE`D{@`AqAzDg@zE]~FFnBy@tI_@dJg@~Gc@`LEbCN`GE`DWfC_@vAaGjN}ApCkHzI{F...

This string turns out to be an encoded list of coordinates, encoded using Google’s Encoded Polyline Algorithm Format. It can easily be decoded by using, for example, the PyPI module polyline:

>> polyline.decode(summary_polyline)[(52.1041, 5.10558), (52.10533, 5.10595), (52.10605, 5.10692), (52.10623, 5.10707), (52.10632, 5.10721), (52.10641, 5.1075), (52.10645, 5.10752), (52.1074, 5.10679), (52.10853, 5.10611), ... ]

This already allows us to make our first plot:

Plot of the decoded summary_polyline

Adding satellite imagery as background

The next step towards the final result is adding satellite imagery as background. As you might’ve guessed, there’s a Python module for that: Matplotlib’s Basemap. (It should be noted here that Basemap is deprecated in favour of the Cartopy project, but I’ve found it still works quite well.) If you provide it with the minimum and maximum latitudes and longitudes you want an image for, Basemap’s arcgisimage() gets you the satellite image of this area. Furthermore, the Basemap object instantiated with these latitudes and longitudes can be called on any given coordinates to project them on that same satellite image. As projecting a globe on a 2-dimensional plane is an all but trivial task (obligatory xkcd), this takes a lot of complexity out of your hands (by the way, Basemap uses the Plate Carrée projection by default).

Same plot, now with a background image

And of course, if we can do it for one ride, we can do it for all of them by simply looping through them and adding them to the map:

All rides together in one plot

Separating rides into clusters

While this is nice, it would be even nicer to see a bit more detail. In my case, there seem to be eight ‘clusters’ of rides (a big one spanning the Netherlands with rides from the three places I’ve lived in the past years and seven smaller ones corresponding to cycling holidays), which could be isolated and zoomed in on.

To do so, we’ll apply the following logic:

  • The bounding box of a ride is the smallest rectangle that contains the whole ride;
  • Rides belong in the same cluster when their bounding boxes overlap. (I added a bit of margin here, so that rides that almost overlap also end up in the same cluster.)

To get our list of rides per cluster, we’ll first make a list of all rides any given ride overlaps with. This is done by comparing every bounding box to every other bounding box and check if they overlap. Simply put, a pair of bounding boxes overlap if the one is not either entirely above, below, to the right or to the left of the other. The full code for this part can be found in group_overlapping.py.

The outcome of this step is a list containing a list for every bounding box. This list per bounding box contains the indices of all other bounding boxes it overlaps with. For example, if ride 0 does not overlap with any other rides, ride 1 overlaps with ride 2, 2 overlaps with 1, 3 overlaps with 4, 4 overlaps with 3 and 5, 5 overlaps with 4 and 6 and 6 overlaps with 5, we’ll get:

[[], [2], [1], [4], [3, 5], [4, 6], [5]]

Then, we’ll merge these lists if they have any common elements (this part was heavily inspired by the answers to this StackOverflow question). Continuing the previous example, we get three clusters:

[[0], [1, 2], [3, 4, 5, 6]]

Using this, we can give each cluster their own subplot: every list corresponds to one subplot in which all rides in that list are plotted. This results in the picture at the bottom of the page, which allow you to see more detail and induces memories of some great holidays :)

All rides, separated into subplots

Data Scientist at Deloitte | Cycling & speedskating enthusiast