Wednesday, August 29, 2007

Follow up on LOAD XML

A few weeks ago (time flies..) i posted a short note about my LOAD XML contribution. It has actually been available for some time, veven if it is not so easy to find. It is included in MySql 5.2.5:
documentation - quite good I think!

It is a bit sad that it is included in 5.2 (still in alfa) and not in 5.1. Since it is a separate command it will not affect existing installations and would not be risky. That faster it gets included the faster we will get feedback on it, which is what I believe is needed right now. Is this something that MySql users need and will use? Or will they use other tools to read XML data into their MySql databases?

LOAD XML reads data into a MySql table in much the same way as LOAD DATA does and in fact the implementation is based on LOAD DATA. This means that it does not parse the entire XML document ( I belive MySql xml function ExtractValue() does that) but instead reads one row at a time from the xml file and inserts it into the database. This makes reading of large xml files possible at quite a decent speed, althoug not as fast as LOAD DATA.

The mapping between XML and the table is based on two assumptions:
- there should be an XML tag that corresponds to a record in the table. You name this tag in the 'ROWS IDENTIFIED BY' clause
- both tags and attributes are matched automatically with column names and values
inserted into the tables. Tags and attributes that do not match columns are simply ignored

When I implemented this the scenario it was designed for is an integration scenario, where you receive data in an XML file periodically and dump it into your database.

There are of course limitations you should be aware of:
- CDATA is not supported and binary fields can not be handled
- it can only load data from a file
- the XML is not verfied, an illegal XML could very well be processed, no validation with xml schema or DTD is made
- there is no namespace support

All in all I think this could be quite a useful function. Arnold Daniels, who worked with me on this contribution initially, is planning some improvements. Among other things to make this work with XML data that is not read from a file.

Personally I will perhaps make some minor fixes, but probably start on something entierly different. Don't know what yet, your suggestions are welcome....


pabloj said...

The current Enterprise vs Community vs Code Contributions policy makes me think that MySQL has lost it's touch with the community or better, I think they are not really looking for a community but for an ecosystem with sustainable economical growth and going into some sort of "freeware" rather than opensource software.

stephane said...

A suggestion as a follow-up to LOAD XML :
mySQL *needs* a query-equivalent of the command line option '--xml'.
Be it 'SAVE XML' or 'OUTPUT AS XML' as you've been discussing in a previous post but imho something more generic like a toggle for output format.Think at it like "SET NAMES....".This way, someone can choose what his output will look like, in terms of structure.