Designing an Automated Visual Diff Tool

Bryan Lester —  December 3, 2014 — Leave a comment

Every Test Automator’s dream is to slack off all day while their automation catches all the bugs.  Something that stands in the way of this paradise are visual bugs.  Though they are often minor, catching them involves an enormous amount of manual testing effort because:

  • Styling is usually shared across a project, so a change to make a button on screen A look better may make it look worse on screen B. This necessitates testing every screen individually.
  • UI-driven tests can not always catch the major ones, since tools may be able to find an element that’s been moved into a user-inaccessible area.
  • Testing is compounded by things like browser / OS compatibility, so you need to visually QA each environment individually.
  • You start having flashbacks to those awful “Find 5 Things Different About This Picture” newspaper puzzles.

As our team gained confidence in our continuous integration (CI) environment, we began releasing more rapidly (every week). While you want to catch as many bugs up front as possible, it’s ludicrous to ask every developer to manually, visually QA every screen on every possible browser before checking in a feature.  So instead, we spent spending hours before every release going through every possible screen, then had to work backwards to figure out what change was to blame.  It takes a lot of perseverance to get someone to slog backwards and fix a minor visual bug they introduced when they’ve already moved on to coding a new feature.

We needed some way to automate catching these up front, before the feature is checked in.  I began experimenting with the Python imaging library, Pillow, to generate pixel-by-pixel diffs of images.  Selenium webdriver provides the ability to take screenshots during a test, so it was easy to add some checkpoints to our existing automation.  Once we had a solution specific to the Bundles project, I decided to release a more general purpose python library, dfrnt, for comparing two sets of screenshots.

During your automation, you generate a directory full of screenshots from the test run (run_dir):


You also have a directory of baseline images, which are the expected “gold standard” images we want to compare against (gold_dir):


And a third, empty directory, where we will output to diffs to (diff_dir):


With this setup, you can start using the tool.  Install it from pip:

This will generate a test.png image in the diff/ folder. This image has bright-red-highlighted all the pixels that differ between the images:



Some tips from integrating dfrnt into our CI  / Selenium testing environment:

  • Don’t check “gold” images directly into your repo, or it may grow large in size. Instead, create a text-file manifest of URLs to fetch the images from and download them with wget -i
  • Use Selenium’s set_window_size() function before taking any screenshots, so you guarantee consistently sized images.
  • When diff’ing screenshots from mobile devices, crop off the top menu from the phone, as the time will always differ.
  • Integrate it into your git flow.  We have a bot that runs as part of our build, and uses the Github API to suggest a pull request against the current branch with the updated “gold” image manifest.  That way, our design team can review the new set of images and approve them by clicking the “Merge” button.

dfrnt has some other capabilities under development, such as allowing a small amount of variation (“fuzziness”) and providing an image containing areas to ignore (“mask”).  It’s available for review and comment at:

Bryan Lester


Bryan Lester is a test automation engineer on the BitTorrent Bundle team. He is an accomplished pinball athlete and explorer of California's finest motorcycle roads.