Saturday, November 24, 2007

Topological sorting in XQuery

In the course of my attempts to implement an XQuery engine for the XML pipeline language XPL, I realized that I would need to sort the processes so that each process has its inputs available, i.e. a topological sort.

Given a sequence of nodes and references with the Relax-NG schema:

element node {
attribute id { xs:string },
element ref {
attribute id { xs:string }

The post-condition after sorting can be defined as:

declare function local:topological-sorted($nodes) as xs:boolean {
every $n in $nodes satisfies
every $id in $n/ref/@id
satisfies $id = $n/preceding::node/@id

and the recursive function to order the nodes is:

declare function local:topological-sort($unordered, $ordered ) {
if (empty($unordered))
then $ordered
let $nodes := $unordered [ every $id in ref/@id satisfies $id = $ordered/@id]
return local:topological-sort( $unordered except $nodes, ($ordered, $nodes ))

Sweet, eh? Even if this implementation is not the most efficient, it has the advantage of being obviously correct.

See (and improve!) the XQuery WikiBook article.


David said...

couple of minor comments:

in local:topological-sorted I think you'd need to use explict subsequencing to access all the earlier items in the sequence rather than using $n/preceding (unless you have an unstated assumption that the initial sequence of nodes are in the same document, in document order)

In the sorting function
let $nodes as element()+:=
would check that $nodes isn't empty
(which stops it looping on cycles for example)

chris wallace said...

Hi David,
Thanks for the hints.
Taking the second first, the unstated assumption was that the graph was a DAG - I've changed the algorithm to say

if ($nodes) then local:topological-sort( $unordered except $nodes, ($ordered, $nodes ))
else ()

to make it safe (I think). Really I should write pre-condition.

I don't quite follow your first point. The test is that nodes in the sequence are or are not in topological order - works with my limited testing at least but maybe I'm missing something ?

chris wallace said...

David, I think I see what you mean - I should use preceding-sibling axis - my limited testing doesn't include a document which may contain other preceding graphs which include nodes - I wonder how often I've forgotten about the context of a node like this .