Monday, February 23, 2009

Data Normalization

My lastest teaching program is for model inference from un-normalized data. This had its inception in a PHP tool written some years ago. The new version uses my ER XML schema to integrate the output with the other data modelling tools.

Normalisation is usually taught on data base courses via the abstract concepts of first, second third normal and higher normal forms. In my introductory module I just want to get over the basic idea of reduction in data duplication through the actual factorisation of a first normal-form table (atomic values) into a set of related third normal-form tables.

Here is the tool, written of course in XQuery:

I get the students to take a nominated data set, factorise it, generate the SQL table declaration and INSERT statements, load them into a MySQL database and then reconstruct the original table using a select statement joining all the tables. This allows the student to check that the factorisation is loss-less but of course it does not check that it is optimal. At present the tool allows the student to explore different possibilities and create any factorisation they like.

The state of the factorisation is the current dataset URI and the current factorisation, defined by an ER model. Currently these are passed between client and server in the URL. This limits the size of the model and i guess I should change to POST but the interface behavior will not be as clean (the back button works as a simple Undo) and I can't have simple generated links on buttons. I guess I need help in interface programming here.

For the record, the code is 600 lines split into 27 functions and using two functions in the larger er module to transform the XML model to DDL and an ER diagram. Code by request until I find a suitable home.

Friday, February 06, 2009

Data Modelling Tutor

The SQL tutor is now in use and seems to be finding favour with students and other tutors. There is a long list of things to add, like the ability to discuss an exercise but the course moves on and now I want to apply the same ideas to the teaching of data modelling. Students often find this rather difficult.

For the past few years we have used an excellent case tool called QSEE, developed by Mark Dixon at Leeds Metropolitan University. We have mainly used this multi-diagram tool for the ER diagrams. QSEE supports conceptual ER models and handles foreign keys, link tables and week entities when generating the SQL DDL. I have a running battle with some other tutors over the use of conceptual ER diagrams versus Relational ER Diagrams, complete with foreign keys and link tables. In my multi-paradigm teaching, conceptual models which treat the later as artefacts of a relational data model makes more sense. Of course I'd like to see a few improvements but sadly development of this tool seems to have ceased. Pity that it hasn't been open sourced.

My teaching emphases the difference between a model and its various representations as diagrams, as text and as code. Since we have already studied XML, it is natural to think of representing the conceptual data model as an XML document and writing the transformations in XSLT or XQuery. Having used Graphviz for a number of years the XML can be transformed to the dot notation to create diagrams in different diagraming conventions. Moreover the goal of providing an interactive data modeling tutor seems more easily realised by processing textual descriptions.

So this weekend, snowed in from work on the boat, I've been working on this tutor and data modelling tool. The really hard part has been to write the model comparator so that differences between a student model and the 'master' model can be detected and explained. This has to take account of variations in the names the student may use as well as differences in order of definition, so a straight XML diff isn't enough. What I have now is not quite right but it will have to be good enough if I want to get this tutor out to students this week.

So here is the link to the index of worksheets so far written:

The transformations currently supported include Chen diagrams, ER diagrams with or without attributes, SQL DDL, SQL DDL with foreign keys and many-many resolution, and a rather clunky English text representation.

One feature which was unplanned and just emerged as an obvious addition, was the ability to provide a default model template so the student could solve initial problems by filling in the blanks rather than starting with a blank page.

There is still a lot to do, but I'm pleased to have got the prototype of the ground - a long-held idea finally coming to fruition - made possible by the power of XQuery and the eXist XML database, for which I give thanks to Wolfgang and and the guys every day.