The Wallace Line: Data Normalization

My lastest teaching program is for model inference from un-normalized data. This had its inception in a PHP tool written some years ago. The new version uses my ER XML schema to integrate the output with the other data modelling tools.

Normalisation is usually taught on data base courses via the abstract concepts of first, second third normal and higher normal forms. In my introductory module I just want to get over the basic idea of reduction in data duplication through the actual factorisation of a first normal-form table (atomic values) into a set of related third normal-form tables.

Here is the tool, written of course in XQuery:

http://www.cems.uwe.ac.uk/xmlwiki/ER/normalizer.xq

I get the students to take a nominated data set, factorise it, generate the SQL table declaration and INSERT statements, load them into a MySQL database and then reconstruct the original table using a select statement joining all the tables. This allows the student to check that the factorisation is loss-less but of course it does not check that it is optimal. At present the tool allows the student to explore different possibilities and create any factorisation they like.

The state of the factorisation is the current dataset URI and the current factorisation, defined by an ER model. Currently these are passed between client and server in the URL. This limits the size of the model and i guess I should change to POST but the interface behavior will not be as clean (the back button works as a simple Undo) and I can't have simple generated links on buttons. I guess I need help in interface programming here.

For the record, the code is 600 lines split into 27 functions and using two functions in the larger er module to transform the XML model to DDL and an ER diagram. Code by request until I find a suitable home.

The Wallace Line

Monday, February 23, 2009

Data Normalization

No comments:

Blog Archive

Links

About Me