Friday, January 23, 2009

Some years ago I wrote an SQL workbook which was used on a couple of courses, but although it had an interactive interface so that that a student could test their SQL statements against the example database, the results were not checked. I planned to create an interactive site which would present tasks to the student, accept the SQL statement input, execute the statement and compare the result with that of the model answer. Since I was teaching PHP/MySQL at the time, in good dog-fooding tradition, I started on an implementation using this platform, but it got sticky and stalled. Then I discovered XQuery and the other day, wrote an XQuery/ eXist implementation.

http://www.cems.uwe.ac.uk/xmlwiki/SQLTutor/help.html


[I'm really supposed to be marking but I find my most creative streak when marking's about - the root perhaps of the love/hate relationship I have with my job.]

The relative ease with which this version was created well illustrates the power of the XQuery/ Native XML database development approach. This application lies in the sweet spot for this technology and here is why I think that is:

XML handles Composition

Each worksheet is represented by an XML document. The document describes properties of the worksheet - database used, title, tags and each of the steps in the worksheet. It is too weak to say that steps are part of the worksheet, they -are- the worksheet. A normalised relational implementation requires one table for the worksheet properties , another for the steps and a foreign key to link these two types together. This artifical separation into master and child records complicates the model and its processing. A symptom of the need for a composition can be found in the naming problem - what do you call the master table - worksheet perhaps? But that's not right - in domain terms a worksheet is the master record AND its children as a single entity. worksheetBody? - yeech.

XML handles order
The Steps in a worksheet are ordered. To represent this in SQL requires an additional sequence number . Then editing to insert and delete steps requires re-numbering. In XML, order is implicit in the document order of the steps.

XML handles heterogeneity
Steps in a worksheet are different types. Some are simple explanations, some are demonstrations of SQL constructs, many are exercises and others have yet to be designed. A relational approach would either coerce all types into a common structure, with lots of null fields, or use multiple tables, one for each type, and a super-type table. [Object-relational databases support a more transparent implementation but who uses those?]. In XML, different types of Step can be freely interleaved in the step sequence.

XML handles multi-valued attributes

As with most of my applications these days, I use tags to support searching and classifying resources. In a normalised relational database, I should break these out into a separate table with a foreign key, but would probably fudge the problem by putting all tags into an arbitrarily concatenated field. In XML tags are simply repeated elements with no temptation to step outside the data model.

XML supports documents

Worksheets are standalone entities, which are authored, edited, deployed, removed as units. In a relational approach, all worksheets would be stored in common database tables, and the domain concept of worksheet as a unit is lost. This is particularly a problem in development: in the SQL tutor, worksheets may either be located in the XML database but may also be accessed from anywhere on the web - it makes little difference to the script providing the interactive interface. So new worksheets can be tested before installation with ease. They can also be exchanged, encouraging reuse and standardisation.

XML Documents are human-editable
My PHP/MySQL prototype stalled on the need to develop an interactive editor for worksheets in additon to the interactive tutor. In XML this component is not required, provided of course that the worksheet author is happy to author XML. If not, there are generic editors available to assist.

XQuery handles XML natively.
Inputs - the script, the SQL table returned by the SQL interface and the XHTML conent of some step properties - and the output - XHTML pages - are all XML structures. A language which handles XML natively does not have perform the re-structuring required in , say, PHP before it can be processed [PHP- XML interfaces have made this somewhat less difficult].

XQuery handles comples processing.
XQuery is much more than the XML equivalent of SQL .The SQL tutor has the task of testing whether two results are functionally the same. Two tables are the same if the column headings are different, but the data the same, or if columns are in different orders, or if rows in an SELECT statement with no ORDER BY clause are in a different order. The sequence and element structures of the XML data model combined with the expressiveness of XQuery make this computation straightforward to code.

A Native XML database supports development
I struggle with the design of the Worksheet schema. Minor additions occur all the time as my ideas develop, but since eXist is schema-independent, I can make these changes bottom-up before I need to formalise them as a schema. In a relational database I'm forced to work top-down, schema first. Moreover I feel safe that if I have a major redesign, an XSLT transformation will do the necessary schema evolution of existing Worksheets.

Conclusion
This case study demonstrates that a relational database is a poor technology for implementing a tutorial about SQL. This asymetry is not the case for XQuery, and the next project is to modify the tutor to do the same for XQuery expressions.

The tutor goes into use with 200 students next week although the worksheets themselves could do with more revision. Since the tutor logs all input, it will be interesting to be able to see what kinds of error are made, and then be able to adjust the hinting accordingly. Wish me luck!