Hunch Aggregator

Aug 14, 2008

I’ve never written about this piece of software before, but have gotten a few questions about it recently, so I thought I’d shed some light on things.

The Hunch Aggregator is a pretty simple thing – it keeps a central state of many addressable online services. Once upon a time, I created and managed most services myself, like hosting images, blogging, chatting, etc. Suddenly better services than those which I have written popped up – Flickr, Jaiku/Twitter, Facebook, Wordpress, Google Reader and so on. So like any pragmatic tech-savvy user would do, I distributed the tasks. Outsourced the pain in the ass of keeping things up to date and working. Now, a new problem arouse: I am one person, but to the outside, the myriad of services did express several different users. Different “persons” or identities. All I want is my friends to be able to hear me, not spend all their time surfing around this myriad of specialized websites.

So I came up with the simple idea of presenting my stuff as a singular stream of events, occurring over time. The first versions of the aggregation software was clumsy and hard to extend. It was unable to synchronize (only add new things) and the presentation was not very sexy. A few years later I blew off the dust from the idea and began from scratch. The result was what is now running on hunch.se.

The concept is simple

  • There are several sources
  • Each source has items
  • Each item has
    • A class – picture, text, sound, recommendation etcetera.
    • A globally uniqe identifier
    • A title and possibly a body with content
    • A URL
    • Tags
    • Information about which source produced it
  • State saved in a RDBMS
  • Synchronization and updates scheduled with fixed time interval
  • Front/UI orthogonally independent from the back-end
  • Robust – withstands state loss and high concurrency

Back-end

Hunch Aggregator back-end tool used for debuggingThe back-end is written in Python and is easy to extend with plugin-like source controllers. SQLite 3 is used for storage. The entry-point is a program called sync.py which has some debugging features.

Front-end

The front-end, or the user web interface, is written in PHP and presents the current state of sources and items. Filters are used for grouping items, like photos from Flickr or Facebook. There is also a URL-based interface for performing arbitrary queries, which is explained further in this article.

Item query language

Because of the nature of the content, one possibly want to display, or subscribe to, only a subset of items. This problem was solved by adding a simple query language and interface.

http://hunch.se/tags/source:flickr+color+autumn

This means “Give me things from the flickr source tagged with color and autumn”. This query apparently is constructed of three different criteria: we want to limit results to items from a particular source (flickr). We also only want items labeled, or tagged, with two free-text tags.

At the point of writing this, the presentation of queries are almost unusable (because I’m lazy and have little time for stuff like this) but on the other hand, you can subscribe to a feed of items matching your criteria:

http://hunch.se/feed/source:flickr+color+autumn

Query fragments, or criteria, of the same kind (source, free-text tags, type) are grouped and AND-ed. The borders between criteria groups are OR-ed. There is also a third URL installation for displaying queries:

http://hunch.se/explain/source:visualizeus+source:stuff+type:picture+color

The explain URL presents the compiled SQL that is executed in RDBMS. In other words, the Item query language is a pre-processor to SQL. The HunchItemSQL source goes into more detail.

Source and license

I rarely see any point in keeping software closed-source. The Hunch Aggregator is no exception as it’s licensed under MIT and freely available from my Subversion repository.