Thursday, 20 June 2013

Riding the Camel

When I originally thought about the technical design for the CP Visibility application, I had in mind a series of blocks that were loosely coupled together for maximum flexibility;

  • An xml database (eXist)
  • A public search engine (SOLR)
  • A publication mechanism to push updates to the phones (MQTT)

However, at that time I hadn't considered exactly how that was going to happen - I'd assumed an api or library or REST game of some sort would present itself (optimism is an integral part of professional IT).

Move on a few months and it's clear that although there are well documented ways to do this in Java: the Paho client library for MQTT and the SOLRJ library for SOLR, each library introduces it's own complexity layer:
  • The library needs to be understood. 
  • Extra java code needs to be written and tested,
  • It has to work in eXist as a module.
Hold on though, each new use of the data, like a twitter or  a websocket feed would, again, require it's own mini 'solution' - created, from scratch, each time. That doesn't sound like a good idea.

Enter Apache Camel, which I'd used in another project at work in the meantime and seemed ideal for this task. Basically Camel knows both how to connect to a load of technologies and has a ruleset that tells it what to route to where and when. It's a 'mediation engine'.
[Have a look at their web site, it does a much better job of describing what Camel does.]


Now for this project the important aspects of Camel are:
  • It can be deployed as a servlet, so slots right in to existing stack.
  • It understands both MQTT and SOLR.
  • It has XQuery/XPath etc built in so it can understand our project files.
What it doesn't have is any understanding of eXist. However, that turns out not to be insurmountable. eXist has a 'file' module that lets us write out an xml project file to a folder on the server. It's not pretty, but it also solves the problem of queuing up updates. Sometimes old-school is the way to go.

We start then by writing an xml file to a folder on the server called camel-in each time there is a new or updated project. Next comes the Camel magic.

Camel consists of <routes> which we can configure in a simple xml file on the server.
First part of the route, the <from> tag picks up any files that appear in camel-in and deletes them after successful processing; the xml content becoming the 'body' or payload for the rest of the route:

<route>

    <from uri="file:/opt/tomcat/temp/camel-in?delete=true"/>

    <setHeader headerName="myid">
        <xpath>/document/id/text()</xpath>
    </setHeader>

    <multicast stopOnException="true">


<to uri="direct:solr"/>
        <to uri="direct:mqtt"/>
    </multicast>

</route>  

Next, we set up a header variable with the id of the project as we need it for mqtt later. Notice <xpath> is built in, so we can read it out directly.  Lastly, we 'multicast' to other routes for mqtt and solr, a bit like calling a sub-routine.
Multicasting is worth explaining; in Camel, a normal route with a couple of steps acts like a pipeline, pouring the body from the output of one into the input of the next. Usually, this is fine, unless you want to alter the body within the route. If you do that, the altered body gets poured into the next step, not the original. We need very different body data for mqtt (text) and solr (xml), so we have to <multicast>. This makes sure a separate copy of the body is sent to each route. First up is SOLR:

<route>
     <from uri="direct:solr"/>

     <to uri="xslt:file:/opt/tomcat/temp/camel-xsl
     /proj2solr.xsl"/>

     <log message="SOLR Update"/>

     <convertBodyTo type="java.lang.String"/>

     <setHeader headerName="SolrOperation">
        <constant>INSERT</constant>
     </setHeader>

     <to uri="solr://localhost/solr/cpsv0"/>
</route>

This route takes a project and uses xslt to put together the right xml format for a solr insert:

<add>
   <doc>
      <field name="id">538</field>
       more fields......
      <field name="location">51.6812,-2.23541</field>
   </doc>
</add>

Next, the body gets converted to a string (rather than xml) and sent to the solr end-point with the appropriate instructions i.e insert the data.

Now, this might seem a little complex, but look at the advantages. No-one needs to know SOLRJ, no code is written that we have to maintain and it's dead easy to alter. Next up is MQTT:

<route>
    <from uri="direct:mqtt"/>

    <to uri="xslt:file:/opt/tomcat/temp/camel-xsl
    /proj2mqtt.xsl"/>

    <log message="MQTT Update"/>

        <recipientList ignoreInvalidEndpoints="false" >

        <simple>
            mqtt:camel?host=tcp://localhost:1883
            &amp;publishTopicName=projects/${header.myid}
        </simple>

        </recipientList>
</route>

This route is a little more complex, but not by much. For the mqtt side we're publishing a message to the /projects/N topic, where N is the project number. The message content is created using the same approach that we used before, since xslt will also output plain text.
Curiously, one thing Camel doesn't do easily is allow you to just drop a variable into a url. Instead, you make up a <recepientList> which allows for route end-point strings to be calculated at run-time . The <simple> tag is the language we're using to make up that string. It could just as easily be <javascript> or <xpath> etc, there's a few you can use.

That's it, in a few lines of xml the updates we need are done. Camel is a very handy bit of kit, we can adapt as the project progresses without having to invest in apis or worry about some bit of code working with another. Best of all, it maintains the loose coupling and flexibility we need.