Friday, November 21, 2008

Introducing Schematron

Schematron is a rule-based validation language for making assertions about patterns found in XML documents. It is a simple language which is based very much on XML itself and uses standard XPath to specify the assertion statements. The Schematron definations (a.k.a Schema) can be processed with standard XSL templates; which makes Schematron applicable is a variety of scenarios.

Although a Schematron defination is referred as a Schema, but one must understand that Schematron differs in the basic concept from other schema languages; i.e. it is not based on grammars but on finding tree patterns in the parsed document. This approach allows many kinds of structures to be represented which could be difficult with grammar-based schema languages. For instance - imagine how would a typical schema be, for the following XML document -

<?xml version="1.0" encoding="UTF-8"?>
<instance>
####<person>
########<fname/>
########<lname/>
####</person>
</instance>

Guess its a no-brainer! It would be something like this -

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
####<xs:element name="instance">
########<xs:complexType>
############<xs:sequence>
################<xs:element name="person">
####################<xs:complexType>
########################<xs:sequence>
############################<xs:element name="fname" type="xs:string"/>
############################<xs:element name="lname" type="xs:string"/>
########################</xs:sequence>
####################</xs:complexType>
################</xs:element>
############</xs:sequence>
########</xs:complexType>
####</xs:element>
</xs:schema>

You must be already wondering about how different would the same schema look like in the "Schematron" world, right? Well, here is the answer -

<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://www.ascc.net/xml/schematron" >
####<pattern name="assert validity">
########<rule context="instance">
############<assert test="person">person element is missing.</assert>
########</rule>
########<rule context="person">
############<assert test="fname">fname element is missing.</assert>
############<assert test="lname">lname element is missing.</assert>
########</rule>
####</pattern>
</schema>

Now, isn't that a much better and understandable version of a schema?

A closer look at the above document would reveal how Schematron differs in the fundamentals of validating a document; the crux herein is not to define the structure of the document (that is what the traditional schema types do), but is to assert the structure. Imagine it to be something like JUnit or NUnit for the XML world; wherein one puts assert statements to check the validity of an object's state. And just like it happens in the JUnit or NUnit world; herein with Schematron also, one can have custom messages for assert conditions.

The Schematron can render custom messages in two cases, v.i.z. -

1. "Report" the presence of a pattern
2. and "Assert" the absence of a pattern

The following Schematron document illustrates the usage of the above discussed features -


<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://www.ascc.net/xml/schematron" >

####<!-- Render messages in case the elements are found. -->
####<pattern name="report validity">
########<rule context="instance">
############<report test="person">person element is present.</assert>
########</rule>
########<rule context="person">
############<report test="fname">fname element is present.</assert>
############<report test="lname">lname element is present.</assert>
########</rule>
####</pattern>

####<!-- Render messages in case the elements are not found. -->
####<pattern name="assert validity">
########<rule context="instance">
############<assert test="person">person element is missing.</assert>
########</rule>
########<rule context="person">
############<assert test="fname">fname element is missing.</assert>
############<assert test="lname">lname element is missing.</assert>
########</rule>
####</pattern>

</schema>


Now that we know how Schematron based schemas look like; the next obvious question is - "How to validate a document against a Schematron schema...?" Well, I already addressed that question in the first paragraph itself, with the following statement -

"The Schematron definations (a.k.a Schema) can be processed with standard XSL templates."

Following statements illustrate the basic processing involved in validating a document with Schematron -

xslt -stylesheet schematron-message.xsl SchematronRules.xml > compiled-SchematronRules.xsl
xslt -stylesheet compiled-SchematronRules.xsl TestData.xml

Now if that looks a little complex; here is a simple Schematron document validator (implemented in Java) which I developed to ease this complexity -

- Schematronize.java

The above listed validator requires the following XSL templates to be locally available -

- skeleton1-5.xsl
- schematron-message.xsl

Thats all for now... I’d leave it here for you to play. Hope you enjoyed reading this article. I will look forward to reading your feedbacks and implementation details around Schematron, so please do take a moment and drop in a note.

Adieu.

Some References -

- The Schematron Website
- The Academia Sinica Computing Centre's Schematron Home Page