Sunday, November 02, 2008

Listen to Twitter

Finding myself working but wanting to know how Lewis Hamilton was getting on, I wondered if Twitter would be able to let me know. I was looking for interesting feeds for the students, so I knocked up a bit of XQuery to fetch the atom feed for a Twitter search and turn that into Voice+XML for use with Opera. Works pretty well even if it is rather unsophisticated and uses page refresh rather than AJAX. The script uses the md5 hash of the last tweet spoken to know what new tweets there are. I plan to have this running on Tuesday in the lecture. One problem is that it only works if the Opera window is active so I can't have it running in the background. However the main problem is that tweets don't indicate the language so a lot of very poor, and probably disappointed Portugese is being tweeted now on the Hamilton stream.

Sunday, October 26, 2008

Wikipedia Categories for Posters

I'd planned to extend the Alphabet maker into a site that assisted Charlie to find appropriate names by inducing the category of terms and then either warning about names not in the category, or correcting spelling based on names in that category, or even suggesting a name for a missing letter.

First I thought I should understand the categories available in dbPedia and started with the Wikipedia categories using the skos vocabulary . I wrote a small skos-based browser:

This has two pages: a category page showing the category, the list of resources in that category with broader and narrower categories and a resource page showing the English abstract and the Wikipedia thumbnail if there is one.

From a category, you can link to a gallery of all thumbnails for resources in that category, and hence to a random Alphabet poster based on that category. There is a significant proportion of dead links among the thumbnails however and I need to look-ahead to exclude them.

One feature of this application which I haven't seen elsewhere (I live a sheltered life!) is the use of key-bindings to perform common searches on selected text. Text selected in the abstract can, with one key-stoke, link to Wikipedia, Google, Google Maps or Google Images. I like the idea of giving more control to the user over what is linked, and I have implemented this on my prototype presentation software which I'm trialling on a couple of courses to see if students find this useful.

Browsing around dbPedia using Wikipedia categories and foaf:depiction is not without its problems. For example the category Amphibians includes:
  • common names of amphibians - Cave Salamander
  • species of amphibians - Gerobactrus
  • groups, families and orders of Amphibians - Oreolalax
  • parts of amphibians - Vocal Sac
  • lists of amphibians - List of all Texas amphibians
  • lists of related subjects - Federal Inventory of Amphibian Spawning Areas
This puts me in mind of Borges' invention of a Chinese classification of animals. Aren't categories like "suckling pigs" and "those that from a long way off look like flies" just delicious? However, erhaps a subject's other categories might help but there is no "List" category for example, so no way to disambiguate the various usages of a category.

foaf:depiction has a similar problem. The Modern Painters category shows a equal mixture of depictions of the painter and depictions of works by the painter, with a few depictions of where the artist lived. This is particularly confusing when the image is a portrait! However, these categories are much cleaner than others, if somewhat incomplete.

It has often been observed that tools based on dbPedia should help to improve Wikipedia. For example it is clear that the Painters by Nationality
should not have any Painter resources, so it would be nice to use this interface to edit the categories of the two errant painters directly from an interface like this.

Wednesday, October 22, 2008

Alphabet Poster

Grandson Charlie (age nearly 6) rang the other night to tell me the animals he had found for the animal alphabet we had discussed the previous night. I thought it would be a neat present to make a program to create a poster by fetching images from the web for each of his words and lay it out as a poster. I like the idea of writing programs as gifts, but Charlie would prefer something real- like a climbing wall!

I thought of using Flickr, or Google images, then settled on using Wikipedia, searched via dbpedia.

There are generally two images included in the dbpedia data - foaf:img - a full size JPEG image and foaf:depiction a GIF thumbnail. The thumbnails are fine for this job.

The SPARQL query to get the thumbnail for an image is rather simple:

PREFIX foaf: <>
:Hedgehog foaf:depiction ?img.

The XQuery script parses the list of words and for each word, uses this query to get the uri of the wikipedia image. The trickiest part was laying out the poster. I struggled to do the gallery layout in CSS alone but could not get this to work with an image + caption. In the end I reverted to a table layout with a width parameter.

The functional XQuery requires the layout to be done in two stages: first generate the table cells in the right, sorted order. Then compute the number of rows required for the given number of columns and generate the table, indexing into the cell sequence to layout the cells in order. In an imperative language, or a language which did not require that every constructed element was well-formed, the two task can be merged. The approach necessitated by the functional language feels cleaner but I'd prefer to write this as a pipeline: sort the words > generate the image cells > layout the table without the need to resolve the structure clash (a Jackson Structured Programming term) between the image order and the table order via a random access into a sequence. The co-routines in Python would make a better solution I feel. XML Pipelines might help but they feel too heavyweight for this minor task.

Charlies Animals so far.

The XQuery Script is in the Wikibook

Monday, September 22, 2008

RDF Vocab work

I'm off to Oxford to learn about RDF Vocabularies at the Oxford Vocamp.

My own meanderings in this field have been limited to a rather hacked Vocabulary Browser written in XQuery:

and my rather limited attempts to provide an RDF extract from the FOLD Information System.

with a current dump of the RDF

Saturday, March 01, 2008

SPARQLing Country Calling Codes

Stimulated by Henry Story's blog entry, I wrote the equivalent in XQuery, and in doing so, bumped into some issues with the dbpedia data. In particular, there is no category I could find to identify a country, but then what constitutes a country depends on what the geographical entity is classified for, so this is to be expected.

In the end I resorted to scraping the wikipedia page which lists the codes directly.

Wikibook module

Thursday, February 28, 2008

XQuery SMS service

I've recently resurrected our two-way SMS service for use by my students in their current coursework, a site to gather and report results for their chosen team sport. I require an exotic interface to the data, for example a speech interface with Opera or an SMS interface. In my SMS installation, the first word in an in-coming message is used to determine the service to which the message is to be routed via HTTP, and the reply if any is then sent via our out-bound service to the originating phone. The framework was originally implemented in PHP, but individual services can be in any language. There are a number of mainly demonstration services implemented. XQuery is used to implement a decoder for UK vehicle license numbers. This is also a nice example of the use of regular expressions. By comparison with the original PHP script, the XQuery version is both cleaner and more general. However there is no regexp function in XQuery which returns the matched groups in an expression, so this is perhaps bodged with a wrapper around the XSLT2 analyze-string function.

Wednesday, February 13, 2008

RDF /Sparql with XQuery

As part of my learning about RDF, Sparql and the semantic web, I thought I would take the familiar employee/department/salary grade example which I used in the XQuery/SQL comparison as a case study. To this end I wrote two XQuery scripts:
  • XML to RDF - a script using a generic function, guided by a map , to translate flat XML tables to RDF and RDFS
  • Sparql query interface - an XQuery interface to a Joseki Sparql service to allow the user to execute Sparql queries against the emp-dept RDF
This is documented in an article in the wikibook.

Monday, January 14, 2008

AJAX, AHAH and XQuery

Today [well some days ago now, this item got stuck in draft] , I came across the abbreviation AHAH to refer to the style of using AJAX to request XHTML fragments to be inserted into an HTML page. The example of XQuery and AJAX to search employee data in the wikibook used this pattern - like the gentleman in Molière's play, I had been speaking AHAH all these years without realising it .

I also happened on an item in Mark McLaren's blog in which he describes the use of this pattern to provide an incremental search of the chemical elements. He advocates using a JavaScript library such as but I'm not sure this library is warranted for a simple task like this (tempting fate here I fear). For teaching purposes, minimal code is best I feel. So I implemented a version using XQuery and minimal JavaScript.
XQuery and AHAH make a pretty good pair I think.

Saturday, January 12, 2008

GoogleChart API and sparklines

As a long-time fan of Edward Tufte's work, I've often wanted to make use of his sparkline idea, but haven't come across a suitable tool to make them. Now the GoogleChart API can generate these and a plethora of other chart types via a web service.

Here is an XQuery script to demo the interface, using the character-based simple encoding of the data:

I have one small problem - I don't know how to get rid of the axes.


I've just discovered the undocumented chart type lfi so the sparkline can be shown without the axes - I found out from Brain Suda's blog

Thursday, January 10, 2008


As we start to think about the equipment we need aboard Aremiti, the Westerly ketch we are currently re-fitting, one new item that is on our shopping list is AIS.

All vessels over 300 tons and passenger vessels over 100 tons are required to carry an AIS transmitter. This broadcasts vessel data such as identification, location, speed and course on a VHF frequency. This is picked up by shore or vessel-based receivers and decoded into NMEA sentences. The data can then be used to map the vessel on a electronic chart or radar or combined with a receiving vessel's own location and course, in collision avoidance. AIS data may also be broadcast by or on behalf of static navigational aids like lighthouses and buoys.

There are a number of manufacturers of AIS 'engines' (receiver/decoders) : NASA (misleadingly called a 'radar' system) and KATAS; and software such as Shiplotter.

Since the setup cost for an amateur shore station is minimal, anyone with line of sight of a busy stretch of water can set up their own. Some publish the results on the web.
A site which I came across tonight,
is a wonderful example of what a enthusiastic web engineer can do with this data. No longer is that ship in the distance a grey blob - it's a vessel with a name, a speed, a destination, a closeup when mashed up with images from this site or
and possibly a story, a history of visits and voyages. In a small boat, that data broadcast to all and sundry could be life-or-death information to you. That distant blob on an apparent collision course is no longer anonymous, routeless and inhuman. If you are still uncertain about the ships intentions, it's so much less confusing to call up a vessel by name than some vague lat/long and bearing.

All this depends on the global unique, stable IMO number, introduced to improve the safety of shipping. On the web, it is this identifier which is the basis on any semantic web data and tools to bring this information together.

The problem for both the above sites is to garner a modicum of funds to support the engineer's passion. One key question for the semantic web is how to reward them for making their deep pot of information available as RDF. It would seem so wrong to scrape their pages, tempting though it is.

Monday, January 07, 2008

More XQuery and Semantic web mashups.

Somewhat rested after a short, breezy holiday in Falmouth , with the server now working, I completed my two case studies of XQuery /DBpedia mashups. Both are described in the XQuery Wikibook. The implementation is still a bit hairy, but now makes use of the SPARQL Query XML Result format, although I still find it useful to transform to tuples with named elements.

The first is the mapping of the birth places of football players by club. [Wikibook]

The starting page is an index of clubs in the top English and Scottish leagues:
The second shows the discography of rock artists and groups, shown as an HTML table and using SIMILE timeline. [Wikibook].

The starting page is an index of artists in a selected Wikipedia category, by default the Rock and Roll Hall of Fame: