Sunday, October 26, 2008

Wikipedia Categories for Posters

I'd planned to extend the Alphabet maker into a site that assisted Charlie to find appropriate names by inducing the category of terms and then either warning about names not in the category, or correcting spelling based on names in that category, or even suggesting a name for a missing letter.

First I thought I should understand the categories available in dbPedia and started with the Wikipedia categories using the skos vocabulary . I wrote a small skos-based browser:

http://www.cems.uwe.ac.uk/xmlwiki/RDF/skosbrowse.xq?action=help

This has two pages: a category page showing the category, the list of resources in that category with broader and narrower categories and a resource page showing the English abstract and the Wikipedia thumbnail if there is one.

From a category, you can link to a gallery of all thumbnails for resources in that category, and hence to a random Alphabet poster based on that category. There is a significant proportion of dead links among the thumbnails however and I need to look-ahead to exclude them.

One feature of this application which I haven't seen elsewhere (I live a sheltered life!) is the use of key-bindings to perform common searches on selected text. Text selected in the abstract can, with one key-stoke, link to Wikipedia, Google, Google Maps or Google Images. I like the idea of giving more control to the user over what is linked, and I have implemented this on my prototype presentation software which I'm trialling on a couple of courses to see if students find this useful.

Browsing around dbPedia using Wikipedia categories and foaf:depiction is not without its problems. For example the category Amphibians includes:
  • common names of amphibians - Cave Salamander
  • species of amphibians - Gerobactrus
  • groups, families and orders of Amphibians - Oreolalax
  • parts of amphibians - Vocal Sac
  • lists of amphibians - List of all Texas amphibians
  • lists of related subjects - Federal Inventory of Amphibian Spawning Areas
This puts me in mind of Borges' invention of a Chinese classification of animals. Aren't categories like "suckling pigs" and "those that from a long way off look like flies" just delicious? However, erhaps a subject's other categories might help but there is no "List" category for example, so no way to disambiguate the various usages of a category.

foaf:depiction has a similar problem. The Modern Painters category shows a equal mixture of depictions of the painter and depictions of works by the painter, with a few depictions of where the artist lived. This is particularly confusing when the image is a portrait! However, these categories are much cleaner than others, if somewhat incomplete.

It has often been observed that tools based on dbPedia should help to improve Wikipedia. For example it is clear that the Painters by Nationality
should not have any Painter resources, so it would be nice to use this interface to edit the categories of the two errant painters directly from an interface like this.

Wednesday, October 22, 2008

Alphabet Poster

Grandson Charlie (age nearly 6) rang the other night to tell me the animals he had found for the animal alphabet we had discussed the previous night. I thought it would be a neat present to make a program to create a poster by fetching images from the web for each of his words and lay it out as a poster. I like the idea of writing programs as gifts, but Charlie would prefer something real- like a climbing wall!

I thought of using Flickr, or Google images, then settled on using Wikipedia, searched via dbpedia.

There are generally two images included in the dbpedia data - foaf:img - a full size JPEG image and foaf:depiction a GIF thumbnail. The thumbnails are fine for this job.

The SPARQL query to get the thumbnail for an image is rather simple:

PREFIX : <http://dbpedia.org/resource/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT * WHERE {
:Hedgehog foaf:depiction ?img.
}

The XQuery script parses the list of words and for each word, uses this query to get the uri of the wikipedia image. The trickiest part was laying out the poster. I struggled to do the gallery layout in CSS alone but could not get this to work with an image + caption. In the end I reverted to a table layout with a width parameter.

The functional XQuery requires the layout to be done in two stages: first generate the table cells in the right, sorted order. Then compute the number of rows required for the given number of columns and generate the table, indexing into the cell sequence to layout the cells in order. In an imperative language, or a language which did not require that every constructed element was well-formed, the two task can be merged. The approach necessitated by the functional language feels cleaner but I'd prefer to write this as a pipeline: sort the words > generate the image cells > layout the table without the need to resolve the structure clash (a Jackson Structured Programming term) between the image order and the table order via a random access into a sequence. The co-routines in Python would make a better solution I feel. XML Pipelines might help but they feel too heavyweight for this minor task.

Charlies Animals so far.

The XQuery Script is in the Wikibook