Using Spatialite to store and represent our GPX data

Background

A quick bit of background. As people who read this blog know, we have recorded every journey we make, every day, with a GPS since 2003 (Dan) and 2007 (Sophia). As you can imagine, this practice has evolved technologically quite a lot over the decade, not so much because of changes in the way GPS data can be stored but more because of what I have learned in the mean time about storing and accessing data.

File Formats

In the beginning, back in the bad old days of using Windoze XP, when you still had to have some funky cable to connect your Garmin GPS (we’ve nearly always used the most basic eTrex) to a computer, I used Heinrich Pfeifer’s still excellent Gartrip shareware programme. In the beginning I was pretty confused about what file format to use to save my GPS traces, since there seemed to be hundreds of different formats and so I made the mistake of saving some of my early tracks only in Gartrip’s binary format, just because it made for small file sizes plus it was compete (it didn’t throw away information, like KML files for example which used to throw out elevation data as I recall).

GPX

Luckily Gartrip could also save as GPX which in few years became my standard storage format – not because it is economical – it is actually pretty redundantly verbose – but because it’s just so damn transferable, being XML and having been around for so long, most GIS programmes feature at least importers if not means of representing the file. Also it’s a good choice for writing your own scripts as there are so many XML parsers out there to adapt. And so presently, the archive of GPS data consists of folders from each year with a hodge-podge of various text and binary file formats settling down to GPX only files after about 2010.

Databases

Meeting Peter Vasil and then Mathieu Baudier transformed our understanding of how to access this stuff in order to cope with the different outcomes we present with the data. I initially asked Peter whether he would be interested in hacking together something in openFrameworks which would represent our GPS data as an animation, the lines drawing themselves in sequence. This eventually lead to (I can’t bear to look at how many hours) drawinglife, the open source release of which I talked about here.

Spatialite

We finally settled on Alessandro Furieri’s very wonderful Spatialite extension to Sqlite as the database drawinglife reads from to produce our visualisation work. Spatialite gives Sqlite the ability to represent standard geographic information like points, lines, polygons, use coordinate representation systems and perform coordinate transformations, for instance into projected coordinates which are useful for representing global spherical coordinates on something 2D like a computer screen.

Track Points and Lines

I have gone into this issue here, but the challenge we face is how to represent our data as lines, when we come to print them out or make an updated image for our archive, while making sure we retain information about each point.

Our solution was to create an importer in Python, gpx2spatialite, which has helper scripts to create and initialise a spatialite database and then import GPX files into it. This is great, because it automates and standardises the process of representing all of the data a GPX file has in it (waypoints too, although I don’t personally have a need for this). As you can see by the link, gpx2spatialite is also available as an open source script for you to use, fork, hack, whatever.

For those still interested, I thought I’d list the schema we use here as a way of perhaps prompting response. Who knows, maybe we’re not the only ones who need a way of storing a lot of data stored as GPX files in a nice spatially enabled database, ready to use with our drawinglife animation and GIS tools like QGIS.

Our Spatialite Database schema for storing GPX data

Here’s the sql file that generates a new spatialite database, ready to hold our GPX files. (This is the create_db.sql file in the gpx2spatialite/data/sql folder you get when you download gpx2spatialite)

 -- Creates tables for homebrew gps repository.
-- Tables:
-- users
-- files
-- trackpoints
-- tracklines

PRAGMA foreign_keys = ON;

BEGIN TRANSACTION;

CREATE TABLE users (
user_uid INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
username TEXT NOT NULL,
UNIQUE (username));

CREATE TABLE files (
file_uid INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
filename TEXT NOT NULL,
md5hash TEXT NOT NULL,
date_entered TEXT NOT NULL,
first_timestamp TEXT,
last_timestamp TEXT,
user_uid INTEGER NOT NULL,
FOREIGN KEY (user_uid)
REFERENCES users (user_uid),
UNIQUE (md5hash));

CREATE TABLE trackpoints (
trkpt_uid INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
trkseg_id INTEGER,
trksegpt_id INTEGER,
ele REAL NOT NULL,
utctimestamp TEXT NOT NULL,
comment TEXT,
course REAL,
speed REAL,
file_uid INTEGER NOT NULL,
user_uid INTEGER NOT NULL,
citydef_uid INTEGER,
FOREIGN KEY (file_uid)
REFERENCES files (file_uid) ON DELETE CASCADE ON UPDATE CASCADE,
FOREIGN KEY (user_uid)
REFERENCES users (user_uid) ON DELETE CASCADE ON UPDATE CASCADE,
FOREIGN KEY (trkseg_id)
REFERENCES tracksegments (trkseg_uid) ON DELETE CASCADE ON UPDATE CASCADE,
FOREIGN KEY (citydef_uid)
REFERENCES citydefs (citydef_uid),
UNIQUE (utctimestamp, user_uid));

SELECT AddGeometryColumn('trackpoints', 'geom', 4326, 'POINT', 'XY', 1);
SELECT CreateSpatialIndex('trackpoints', 'geom');

CREATE TABLE tracklines (
trkline_uid INTEGER PRIMARY KEY AUTOINCREMENT,
trkseg_id INTEGER,
comment TEXT,
timestamp_start TEXT NOT NULL,
timestamp_end TEXT NOT NULL,
length_m REAL,
time_sec REAL,
speed_kph REAL,
file_uid INTEGER NOT NULL,
user_uid INTEGER NOT NULL,
FOREIGN KEY (file_uid)
REFERENCES files (file_uid) ON DELETE CASCADE ON UPDATE CASCADE,
FOREIGN KEY (user_uid)
REFERENCES users (user_uid) ON DELETE CASCADE ON UPDATE CASCADE,
FOREIGN KEY (trkseg_id)
REFERENCES tracksegments (trkseg_uid) ON DELETE CASCADE ON UPDATE CASCADE,
UNIQUE (timestamp_start, user_uid, trkseg_id)
);

SELECT AddGeometryColumn('tracklines', 'geom', 4326, 'LINESTRING', 'XY', 1);
SELECT CreateSpatialIndex('tracklines', 'geom');

CREATE TABLE citydefs (
citydef_uid INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
city TEXT NOT NULL,
country TEXT NOT NULL,
UNIQUE(city, country)
);

SELECT AddGeometryColumn('citydefs', 'geom', 4326, 'POLYGON', 'XY', 1);
SELECT CreateSpatialIndex('citydefs', 'geom');

CREATE TABLE tracksegments (
trkseg_uid INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
trkseg_uuid TEXT NOT NULL,
UNIQUE(trkseg_uuid)
);

CREATE TABLE waypoints (
wpt_uid INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
wpt_name TEXT,
ele REAL,
utctimestamp TEXT,
sym TEXT,
file_uid INTEGER,
user_uid INTEGER,
citydef_uid INTEGER,
FOREIGN KEY (file_uid)
REFERENCES files (file_uid) ON DELETE CASCADE ON UPDATE CASCADE,
FOREIGN KEY (user_uid)
REFERENCES users (user_uid) ON DELETE CASCADE ON UPDATE CASCADE,
FOREIGN KEY (citydef_uid)
REFERENCES citydefs (citydef_uid));

SELECT AddGeometryColumn('waypoints', 'geom', 4326, 'POINT', 'XY', 1);
SELECT CreateSpatialIndex('waypoints', 'geom');

COMMIT;

The trackpoints table is where each individual point gets stored. In our drawinglife visualisation application, we needed an idea of where these lines were. To see what I mean, see our piece Narrating Our Lines, where you can see us try and work out where we were from the visualisation. The locations for this is pulled through from the citydef_uid column in the trackpoints table which is a foreign id to the citydefs table, where we have started to define the locations we have been to in the last years. This is a pretty custom aspect and I suspect will give us issues if our userbase grows beyond the three people I know use it at the moment. It’s also something that slows down the importing process a lot as each trackpoint needs to be queried against all the locations in citydefs.

As well as the points, we needed to represent our traces as lines which needs a different geometry type and so lives in another table, tracklines. This is the table that we normally pull through in Qgis when we produce our archive images or laser engravings or printed drawings.

As well as these obvious extractions of the GPX data, we track users in the users table, as Soph’s data and my data lives in the same database so that the animation only has to refer to one database. Also, when we make animations with multiple users, their tracks can all remain in one database.

We also track the GPX files that are imported, generating an md5 hash so that we know if we’re reimporting a file with identical content, even if the name is different. gpx2spatialite also finds the first and last date in the file and stores this.

There’s also a tracksegments file as at one stage, we thought we needed a way of tracking segments (those are equivalent to an unbroken line on the drawings or a row in the tracklines table) with uuids. I’m not sure we need this any longer and it might disappear from future versions of the schema.

So there it is, I encourage you to try it out if you have some GPX data floating around you want to visualise in Qgis or if you’re feeling brave, Peter’s fantastic drawinglife application. I’m also starting to write analyses of this data in Python which is pretty simple using pyspatialite and because it’s based on sqlite, the whole thing is really easy to script for, just being one, light file, rather than all that server client architecture the bigger databases require.

Feedback, as always, greatly appreciated.

This entry was posted in Code, Diary, GPS, Linux, Python, Software and tagged , , , , , , , , , , , , , , , , . Bookmark the permalink.

1 Response to Using Spatialite to store and represent our GPX data

Comments are closed.