Python/Pandas Fun with GPS Data, GPS Python GIS

Fun with GPS Data in Python

In this post we have some fun using a number of Python packages related to Geographic Information Systems (Wikipedia) (GIS). In it we will analyze GPS data from a nice hike. The same techniques can be applied to GPS track from any other fun activities such as walking, sailing, biking, etc… Or not to not so fun activities such as commuting, etc…

If you aren’t interested in programming in Python and just want a nice place to visulize GPS data for a hike visit Open Topo Map click on the gpx icon and load your GPX file.

We’ll be using a number of Python packages as detailed below. However, the point of this post is to show how to easily see some interesting results without reading through all the documentation! The GIS Python packages can rely on some native (C, C++) libraries and hence can be a bit tricky to install. A clear set of installation instructions from the GeoPandas project will get you everything you need.

  • GeoPandas “GeoPandas is an open source project to make working with geospatial data in python easier. GeoPandas extends the datatypes used by pandas to allow spatial operations on geometric types. Geometric operations are performed by shapely. Geopandas further depends on fiona for file access and matplotlib for plotting.”
  • Fiona “Fiona reads and writes geographic data files and thereby helps Python programmers integrate geographic information systems with other computer systems.”
  • Pandas “pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.”
  • Shapely “Manipulation and analysis of geometric objects in the Cartesian plane.”

Extra files: HikingAnalysis.ipynb and Hike-2022-06-30.gpx

import geopandas as gpd
import fiona
import shapely
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

Getting GPS Data

In this post we are going to work with GPS data in GPX (Wikipedia) format. GPX stands for “GPS Exchange Format” and is a widely used format to exchange GPS information. It can contain data as routes, tracks, and way points. From Wikipedia:

Conceptually, tracks are a record of where a person has been and routes are suggestions about where they might go in the future. For example, each point in a track may have a timestamp (because someone recorded where and when they were there), but the points in a route are unlikely to have timestamps (other than estimated trip duration) because route is a suggestion which might never have been traveled.

Get a GPX file or download mine to use. Set the file name below to point to your the file of your choice.

filename = r'C:\Greg_Home\GPS\Hikes\Hike-2022-06-30.gpx'

Layers in GIS Files

The concept of map layers is ubiquitous in GIS systems and many file formats. Hence many libraries/packages utilize the concept in places where you might not expect it. Here we look at the layers of our GPX file with fiona:

['waypoints', 'routes', 'tracks', 'route_points', 'track_points']

Here we see that fiona understands that GPX files can contain routes, tracks and waypoints. We’ll see how the route_points and track_points are different from routes and tracks in a bit.

Track Layer

Let’s read in the track layer and take a look at where we have been in two lines of code!

gdf_tracks = gpd.read_file(filename, layer="tracks")


Let’s look at what the gdf_tracks object is and what it contains.

print(f"length of gdf_tracks: {len(gdf_tracks)}")
<class 'geopandas.geodataframe.GeoDataFrame'>
length of gdf_tracks: 1
<class 'pandas.core.series.Series'>
Index(['name', 'cmt', 'desc', 'src', 'link1_href', 'link1_text', 'link1_type',
       'link2_href', 'link2_text', 'link2_type', 'number', 'type', 'geometry'],
<class 'shapely.geometry.multilinestring.MultiLineString'>

Getting the Shape

Hmm, that’s a lot of stuff. But the very last item is a shapely.geometry. This has the actual coordinates of the track so lets take a look.

Fiona/GeoPandas interprets tracks as a 2-dimensional multi-line string. No elevation information, time, or other information is kept.

mygeo_latlon = gdf_tracks.iloc[0]["geometry"]


# Check the length of the multi-part geometry
print(f"Number of parts to the geometry: {len(mygeo_latlon.geoms)}")
my_coords = mygeo_latlon.geoms[0].coords
print(f"length of coordinates: {len(my_coords)}")
Number of parts to the geometry: 1
<class 'shapely.geometry.linestring.LineString'>
length of coordinates: 6583
[(-121.830436667, 37.515446667), (-121.830446667, 37.515445), (-121.830446667, 37.515443333), (-121.830461667, 37.515443333), (-121.830473333, 37.515446667)]

So the “shape” is something called a “line string” and it contains an array of the coordinates. We see that the coordinates are in (longitude, latitude) format as record from the GPS device.

So for the tracks “layer” Fiona/GeoPandas gives us a 2-dimensional multi-line string. No elevation, time, or other information is kept.

# How long a hike, assume 1 coordinate a second
hike_hours = int(len(my_coords)/3600)
hike_minutes = int(len(my_coords)/60 % 60 + 0.5)
print(f"I was hiking for {hike_hours} hours and {hike_minutes} minutes")
I was hiking for 1 hours and 50 minutes

Simplifying Tracks Part 1

That’s a lot of track points. Can we reduce the number of points and have the shape look like the original? This turns out to be a common operatin and is supported by shapely:simplify. Here we are giving a tolerance value in decimal degrees.

simp_geo = mygeo_latlon.simplify(0.00001)


print(f"simplified number of coordinates: {len(simp_geo.coords)}")
simplified number of coordinates: 479
[(-121.830436667, 37.515446667), (-121.830461667, 37.515443333), (-121.830473333, 37.515446667), (-121.830485, 37.51545), (-121.830495, 37.51545)]

Distance Computations Raw

How far did I go? How do we compute distances from latitude and longitude based coordinates? From stack overflow I got this python implementation of the Wikipedia: Haversine formula. This results are in kilometers.

from math import cos, asin, sqrt, pi
# In kilometers
def distance(lat1, lon1, lat2, lon2):
p = pi/180
a = 0.5 - cos((lat2-lat1)*p)/2 + cos(lat1*p) * cos(lat2*p) * (1-cos((lon2-lon1)*p))/2
return 12742 * asin(sqrt(a)) #2*R*asin...
meter2miles = 6.213712e-4
total = 0.0
coords = simp_geo.coords
for i in range(len(coords) - 1):
total += distance(coords[i][1], coords[i][0], coords[i+1][1], coords[i+1][0])
haver_total = total*1000*meter2miles

Distance from GeoPandas

Shouldn’t GeoPandas or Shapely be able to compute how far I hiked? Let try it.

C:\Users\gregb\AppData\Local\Temp\ipykernel_24036\ UserWarning: Geometry is in a geographic CRS. Results from 'length' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.


0    0.063
dtype: float64

Hmm, that didn’t work. Let’s check the coordinate reference system CRS (Wikipedia).
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

Oh, we are using raw latitude and longitude. What we need is a map projection to take us from angle measurements in latitute and longitude to coordinates in something like miles or kilometers. For example the Web Mercator projection (EPSG:3857) is used for the maps of the OpenStreetMap-project and commercial products such as Bing Maps or Google Maps.

Let’s try setting the projection via GeoPandas:

gdf_tracks_3857 = gdf_tracks.to_crs(3857)
meter2miles = 6.213712e-4
0    4.835374
dtype: float64

This is off by a mile from what we calculated from the Haversine formula. What happened?

Better Projection

We just hit upon one of the big issues that led to a proliferation of map projections over the last few centuries. No single map projection is good for all purposes. A general world wide projection like the Web Mercator that we used above will distort distances locally. See: Measuring Distances and Why Projections Matter.

For my hike it is better to use EPSG 32610.

Area of use: Between 126°W and 120°W, northern hemisphere between equator and 84°N, onshore and offshore. Canada - British Columbia (BC); Northwest Territories (NWT); Nunavut; Yukon. United States (USA) - Alaska (AK).

Coordinate system: Cartesian 2D CS. Axes: easting, northing (E,N). Orientations: east, north. UoM: meter

Let’s check that our lat and lon are within bounds then compute distances.

minx miny maxx maxy
0 -121.834233 37.515428 -121.82141 37.52734
gdf_tracks_32610 = gdf_tracks.to_crs(32610)
dist32610 = (gdf_tracks_32610.length*meter2miles)[0]
diff = haver_total - dist32610
print(f'Projection distance: {dist32610:.2f}, Haver distance: {haver_total:.2f}, diff: {diff:.2f}')
Projection distance: 3.83, Haver distance: 3.76, diff: -0.07

This is much better and this is why a good local street map or topo map will use an appropriate projection, i.e., so you can get reasonably accurate distances by looking at the map.

Simplifying Tracks Part 2

Another advantage of using an appropriate projection is that we can now specify the tolerance use to simplify our tracks in a intuitive units, e.g., meters. Below we set the tolerance to simplify the hiking track to 3 meters. Note huge reduction in the number of points. This could be useful in producing routes to be followed from tracks.

# How many points when tolerance set to 3 meters?


Note that it is good to use simplify when computing distances for slower moving activities since even if you are standing still GPS measurement noise will make it look as though you are moving.

# How many points do we get when tolerance set to 1 meter?
0    3.760639
dtype: float64

Finally we take a peak at the coordinates in the ESPG 32610 projection. It turns out that these are in (x, y) meter pairs and hence distance calculations with them can just use Euclidean distance formulas.

coords32610 = gdf_tracks_32610.simplify(1.0).iloc[0].coords
[(603358.3633857827, 4152697.5049648797), (603356.1585678917, 4152697.1075925604), (603355.1229669412, 4152697.464678713), (603354.0872790867, 4152697.8216529465)]

Track Points Layer

In addition to latitude and longitude coordinates GPS track points can contain other information such as elevation and time. For mountain hikes I’m really interested in elevation information! This is where the track_points layer comes in handy. It includes almost all the information available in GPX format.

Let’s look at the column names (keys) that GeoPandas gives us and look at the first four rows of data.

gdf_points = gpd.read_file(filename, layer="track_points")
Index(['track_fid', 'track_seg_id', 'track_seg_point_id', 'ele', 'time',
       'course', 'speed', 'magvar', 'geoidheight', 'name', 'cmt', 'desc',
       'src', 'url', 'urlname', 'sym', 'type', 'fix', 'sat', 'hdop', 'vdop',
       'pdop', 'ageofdgpsdata', 'dgpsid', 'geometry'],
track_fid track_seg_id track_seg_point_id ele time course speed magvar geoidheight name ... sym type fix sat hdop vdop pdop ageofdgpsdata dgpsid geometry
0 0 0 0 99.0 2022-06-30 18:03:22+00:00 209.800003 0.252078 None None None ... None None 3d 5 1.6 None None None None POINT (-121.83044 37.51545)
1 0 0 1 100.0 2022-06-30 18:03:25+00:00 315.200012 0.262367 None None None ... None None 3d 5 1.6 None None None None POINT (-121.83045 37.51544)
2 0 0 2 103.0 2022-06-30 18:04:05+00:00 0.000000 0.000000 None None None ... None None 3d 6 1.4 None None None None POINT (-121.83045 37.51544)
3 0 0 3 103.0 2022-06-30 18:04:06+00:00 291.200012 1.111200 None None None ... None None 3d 6 1.4 None None None None POINT (-121.83046 37.51544)

4 rows × 25 columns

GPX Point Data

So what is all this extra information?

  • tracks can contain multiple track segments e.g., when a GPS unit’s data collection is turned on and off. Hence the track segment related information.
  • Per GPX standard ele is the elevation in meters.
  • Per GPX standard time is the date/time referenced to UTC
  • course and speed are extensions permitted by GPX and I only see these from my GPS watch that I use for windsurfing, kitesurfing, and sometime use hiking.
  • sat is the number of GPS satelites in view
  • hdop, vdop, and pdop are related to GPS accuracy

Interactive Tracks with Elevation and Time

Let’s use this data along with GeoPandas ploting capability to get an interactive map that can show the local time and elevation for each point along our hike. To do this we’ll create new “columns” in our GeoPandas data frame with a localized time string and the height converted to feet.

Note that the map is “interactive” in the Jupyter notebook not on the web page where we just show a static image.

# Function to take a Pandas timestamp and convert it to
# a local time string.
def ts_local_time(ts):
return str(ts.tz_convert(tz='US/Pacific').time())
meters2feet = 3.28084
gdf_points["ltime"] = gdf_points["time"].apply(ts_local_time)
gdf_points["efeet"] = meters2feet*gdf_points["ele"]
gdf_points[["efeet", "ltime", "geometry"]].explore()


Check Vertical Progress

So how much vertical progress did our hike entail? By using the Pandas describe function show below we can see the minimum elevation was 275 feet and the maximum was 1325 feet.

count    6583.000000
mean      884.368538
std       315.168774
min       275.590560
25%       603.674560
50%       941.601080
75%      1141.732320
max      1325.459360
Name: efeet, dtype: float64

Elevation versus Distance

For hikes it is very nice to get a plot of elevation versus distance. Here we use GeoPandas and a the projection that gives us good distance measurements.

# Getting the bounds over all the points in the GeoPandas data frame.
array([-121.83423333,   37.51542833, -121.82141   ,   37.52734   ])
# Set a good projection for distance computations
gdf_points32610 = gdf_points.to_crs(32610)
# Create a data frame from the differences between adjacent
# points and take a look at the first few entries.
diff_frame = pd.DataFrame({"xdiff": gdf_points32610.geometry.x.diff(),
"ydiff": gdf_points32610.geometry.y.diff()})
xdiff ydiff
0 NaN NaN
1 -0.881467 -0.195939
2 0.002299 -0.184952
3 -1.325650 -0.016481
# Compute the distance increment between adjacent points
dist_incrs = diff_frame.apply(lambda row: sqrt(row["xdiff"]**2 + row["ydiff"]**2), axis=1)
# Compute the accumulated distance up to each point
acc_dist = dist_incrs.cumsum()
# Convert to miles and take a look
trk_miles = acc_dist*meter2miles
0            NaN
1       0.000561
2       0.000676
3       0.001500
4       0.002180
6578    3.828678
6579    3.829488
6580    3.830083
6581    3.830595
6582    3.830778
Length: 6583, dtype: float64
# Replace the NaN with 0.0
trk_miles.iloc[0] = 0.0
# Get an array of heights in feet
trk_height = gdf_points32610.ele*meters2feet
# Plot it!
plt.plot(trk_miles, trk_height)

Distance versus Elevation 1

Projection Independent Elevation versus Distance

Alternatively we can do this in a way that is independent of the map projection via the Haversine formula that we used at the begining of this post. Here we will go to one of the underpinnings of Pandas which is the numpy package.

# Extract coordinates into a numpy array
coords = np.array([gdf_points.geometry.x.values,gdf_points.geometry.y.values])
# Take a look
print(coords[0][0], coords[1][0])
-121.830436667 37.515446667
<class 'numpy.ndarray'>
# Get distance increments for an array of lat, lon values
# Uses def distance(lat1, lon1, lat2, lon2):
def dist_incrs(lon_lats):
size = lon_lats.shape[1]
incrs = np.zeros(size)
for i in range(1, size):
incrs[i] = 1000.0*distance(lon_lats[1][i-1], lon_lats[0][i-1],
lon_lats[1][i], lon_lats[0][i])
return incrs
# compute distance increments
incrs2 = dist_incrs(coords)
# compute accumulated distance
cum_dist = incrs2.cumsum()*meter2miles
# get elevation in feet
elev = gdf_points.ele.values*meters2feet
# Plot it nicely this time
fig, ax = plt.subplots()
ax.plot(cum_dist, elev)
ax.set(xlabel='Distance (miles)', ylabel='Elevation (feet)',
title='Flag Hill Loop')

Distance versus Elevation 2