Wednesday, May 06, 2009

Matching sequences in XQuery

Collation is a core algorithm in processing sequences. In XQuery, the straight-forward expression of the algorithm is as a recursive function:



declare function local:merge($a, $b as item()*)
as item()* {
if (empty($a) and empty($b))
then ()
else if (empty ($b) or $a[1] lt $b[1])
then ($a[1], local:merge(subsequence($a, 2), $b))
else if (empty($a) or $a[1] gt $b[1])
then ($b[1], local:merge($a, subsequence($b,2)))
else (: matched :)
($a[1], $b[1],
local:merge(subsequence($a,2),
subsequence($b,2)))
};



Coincidently, Dan McCreary was writing an article in the XQuery wikibook on matching sequences using iteration over one sequence and indexing into the second. The task is to locate missing items. Collation is one approach to this task, albeit requiring that the sequences are in order.

Here is a test suite comparing three methods of sequence comparison.

I also did some volume tests with two sequences differing by a single, central value. Here are the tests on a sequence of 500 items. In summary, the timings are :

* Iteration with lookup: 6984 ms - not repeatable - average is 2600
* Iteration with qualified expression: 1399 ms
* Recursive collate: 166 ms

The collate result is surprising and rather impressive. Well done eXist!

No comments: