Personal Data Mountain – coding strategies

While Soph and I were working last week at the HZT on plan b stuff again at last, we were thinking about the data we collect i.e. GPS, text messages, mood reports (for 2011 only) and photographs.

We were preparing something for the try-out we did last Thursday in which we performed Narrating Our Lines live in front of an invited audience to see if this also worked as a performance, not just a video installation.

One of the things we wanted to try was a fast slideshow (actually a movie) of all the photos we took in the year we decided to play (2007). As I am unsatisfied with any photo management programme I have tried, preferring to order by location rather than date, the photos are scattered among multiple directories.

I knew I could use ffmpeg to stitch individual photos together into a movie, once I’d resized them with the excellent mogrify command in imagemagick, but I needed something that would copy all the photos taken in 2007 to a location so that I could work on them, so I wrote a quick python script you can examine/download below if you’re interested.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

2013/01/07 19:55:57 Daniel Belasco Rogers

User points script at a root directory and script finds all images for
a certain year derived from the Exif data and copies these images into
a destination folder supplied by the user

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or (at
your option) any later version.

This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see <>

from optparse import OptionParser
from shutil import copy2
import os
import pyexiv2
import sys

def parseargs():
    usage = """
%prog   """
    parser = OptionParser(usage, version="%prog 0.1")
    (options, args) = parser.parse_args()
    if len(args) != 3:
Please enter a year in the form YYYY, a directory to search for images
under and a directory to save a copy of the images to
e.g. 2007 "/nfs/photos/" "/media/ext3/"
    year = args[0]
    searchpath = args[1]
    destination = args[2]
    return year, searchpath, destination

def getexifdate(pathname):
    get creation date from exif
    metadata = pyexiv2.ImageMetadata(pathname)
    except IOError:
        print "%s Unknown image type" % pathname
        return False
        tag = metadata['Exif.Photo.DateTimeOriginal']
    except KeyError:
        print '%s tag not set' % pathname
        return False
    return tag.value

def findimages(year, searchpath):
    ùse os.walk to find images with .jpg extension
    year = int(year)
    imagelist = []
    for (path, dirs, files) in os.walk(searchpath):
        for f in files:
            pathname = os.path.join(path, f)
            if os.path.splitext(pathname)[1].lower() == '.jpg':
                imagedate = getexifdate(pathname)
                if imagedate:
                        imageyear = imagedate.year
                    except AttributeError:
                        print '%s invalid date in exif: %s' % (pathname, imagedate)
                    if imageyear == year:
    return imagelist

def copyimages(imagelist, destination):
    iterate through imagelist, copying images to destination directory
    make the dir in a different way by checking if it is present first
    and making it if not, rather than catching it like this.
    for image in imagelist:
        destinationpath = os.path.join(destination, os.path.split(image)[1])
        print "copying %s to %s" % (image, destinationpath)
        # try:
        copy2(image, destinationpath)
        # except IOError:
        #     os.mkdir(destination)
        #     copy2(image, destination)
        # except OSError as e:
        #     print e
        #     sys.exit(2)

def main():
    call all functions within script and print stuff to stdout for
    year, searchpath, destination = parseargs()

    print "Looking in %s for images from %s" % (searchpath, year)
    imagelist = findimages(year, searchpath)
    print "Found %d images" % len(imagelist)

    print "Copying images to %s" % destination
    copyimages(imagelist, destination)
    print "Copied %d images. Script ends here." % len(imagelist)

if __name__ == '__main__':

All this made me think, however, how much we are all becoming used to this idea of having too much data to sort through. I think its something that lots of us can now relate to when it comes to digital photographs. Running the script above I found about 2500 photos, representing gigabytes of data. Some of the photos I hadn’t seen since I took them and were gathering digital dust somewhere in a remote corner of my filing un-system. To make this stuff (our stuff) understandable, or even viewable, graspable, we need tools to manage it. It is no longer possible or even appropriate to browse through our photos and pull out the ones we’re interested in, we need tools to do this for us.

I have to admit to a feeling of great pride and joy that I could write my own, thanks to acquiring some basic Python skills over the past couple of years.

This entry was posted in Code, Diary, Python, Software and tagged , , . Bookmark the permalink.