Friday, 21 December 2012

Technology choices

It's a great moment when after all the hard work you win some funding for your Idea. So, having won the Geovation Challenge we were faced with the, very welcome, problem of "Where do we go from here?" Well, to try and answer that from a technology viewpoint, here's some thoughts:

  • You can create a good solution if you can leverage existing technology and experience; as long as you don't fall into the "everything looks like a nail" trap. 
  • It's better to be able to produce something quickly - especially if you're doing service design than to go through a big software cycle only to find you've built a "dinosaur".
  • It's good if you can "loosely-couple" the solution. If you make it out of blocks, you can replace them if you have scale issues, find a better way, need additional abilities etc without destroying everything.
Now as a Trust, although things like developing on mobile devices is a new area, we do know about data and that's pretty much the place to start: at the back end.

Databases - where does stuff go? 

In the sphere of databases there are a few clear trends: a) SQL system like Oracle or MySQL, b) Non-SQL systems like Hadoop and CouchDb. In fact there is a third: XML native databases. These are a form of no-sql, but are also structured.
Now we use this technology a fair bit at work as Offender information is too loosely structured to hold easily in SQL, but it is nicely modelled in XML which is also pretty flexible about changes. We use the eXist Open Source XML database at work which has a built-in XQuery language at there is no reason not to keep using that: it fits the problem and we have experience. So, the first decision is easy, i.e. we hold each project as a separate piece of XML, like so:
<document use="geovation" version="1.0">
<title>Queen St, Sandwell, West Bromwich.
<created iso="2012-11-21Z">Wed, 21 Nov 2012</created>
<address>Queen St, Sandwell, West Bromwich, West Midlands B70, UK</address></position>
<description>Fly tipping near the canal.</description>
<comment>This looks like a great project - let's do it!</comment></document> 

This is the first cut, with fields for position, description, status and somewhere to put images. For efficiency, we've put the image uploaded by the author on the file system, rather than in the database.

We manipulate this data with a language called XQuery. This can not only query the data, but is also a complete language in it's own right. This means we can write an entire application in just XQuery - it saves a lot of time over traditional languages which rely on embedded SQL to talk to the database! As if that isn't enough, we also get thrown in for free XForms via XSLTForms and a nice declarative way to create the web interface for the administrators to work with the projects.

So, we have a database, we can create, save, query projects. Aren't we done?

Search - set the data free!

We have a database and code, but at the end of the day this isn't a closed project; not only our code, but as far as possible our data is Open. What we want from the solution is a system that allows anyone to include the project data in a web-site, mashup, application or whatever. That gives us a few issues:
  1. Scale. We don't know how successful this will be.
  2. Security. We need to separate the internal project with maybe private text, from whatever data we expose to the outside world. We want to avoid log-ins and api-keys if possible.
  3. Geo-search, e.g. "I'm here, what projects are nearby" is a key requirement. This is possible, but isn't really well supported in eXist at the moment - so lets not "treat it like a nail"
We don't really want to construct a whole Search API ourselves, if we can use someone else's to do the job so that leads us to Apache SOLR. SOLR is a very fast search engine built on top of Lucene. It has good scalability, a well written REST api, good geo-search,  and can spit out results in XML, JSON and a few other formats at will. So, what we'll do is this:
  1. Projects get created in eXist as XML
  2. When they go live, all the fields we want to expose get copied to SOLR - easy.
  3. If the project alters, we keep SOLR in step
Now we have a fast search engine with a well-documented API and all the data we wont to expose. No need for extra security, logins etc. 

Pub/Sub Getting the message out.

The last piece of the puzzle we need is how to keep app users up to date with project changes. Admittedly, we could just keep a list on each phone and scan the database occasionally. That's a bit problematic as a solution though and doesn't scale well at all: imagine going to the post office to get a letter and waiting whilst they scoured the place for it. Now imagine a thousand people doing it at the same time. What you need is a pigeon-hole with your letter in it.
That's what pub/sub does. It consists of a Broker, that acts as the post office. The apps on the phone all subscribe to a topic for each project the user has created and any they are following, e.g. /projects/P0121 . When something changes in the project, the server publishes the change to the broker.
The next time the app connects, it gets any messages published for those topics - simple.

We're using a nice light protocol for this called MQTT  though we haven't decided on a broker yet. Having said that, mosquito seems very good!

Where we are now.

So, this is where we are at the moment, an application in three bits:

  • a database
  • a search engine
  • a status messenger
all working together.

This is probably not the last iteration of this system, and once we've built it, we'll probably tweak it quite a bit. But, hopefully, the ideas of loose coupling, quick development and leveraging our strengths will prove themselves in the real world.

No comments:

Post a Comment