Wednesday, August 20, 2008

Part I : XRX Architecture - Unleashed

During my tenure at Software AG, I was introduced to the power of XML. To my candor, i was very impressed to see the wide range of problems that XML promised to solve. Especially, to mention the performance power-punch that XML databases unleash when combined with XQuery and REST.

A couple of years later, I moved into the consultancy world. Being back in the consultancy world and seeing business systems with my renewed XML-ized eyes (all credits to Software AG Tamino API and tools laboratory), I was highly dismayed to find that business oriented systems seldom use the real power of XML. The usage of XML herein has been more-so confined to defining application configurations and bare minimum transformation to either render interface elements or reports. I agree, there have been a few exceptions to this statement too.

After spending quite some years revolting the general underutilization of XML; apparently I found my preferred architecture – i.e. XRX (XForms/REST/XQuery).

What is so unique about XRX architecture...?

Generally, web application architectures accept inputs from HTML forms, which provide the user inputs in flat key-value pairs, thereafter these data structures are converted to middle tier objects such as Java or .Net and then transformed into tabular data streams so they can be stored in relational databases. Once in the relational database, the data must then be re-serialized by doing SELECT statements, converted into objects and the objects then converted back into HTML forms. This is four-step translation architecture.

In contrast to the above discussed conventional approach, XRX uses the zero-translation architecture. Zero translation implies that XML is stored in the web client, transmitted to a middle tier validation rules engine in XML and then stored in its entirety in an XML database. The storage in a single XML object is also known as a zero-shredding process since the data files are not separated into Third Normal Form (3NF) data structures. The key here is to break the myth and realize that 3NF shredding does not add any business value to the system.


Over the years, I have preached XForms as a very solid architectural composite; as it offers an order-of-magnitude improvement over other web front-end development architectures. Now, XForms when combined with the traditional XML performance power-punch i.e. REST and XQuery, defines a radical architecture that works wonders.

I'm sure you must be already wondering: how to build this magical combination of XForms - REST - XQuery (XRX)? The following architecture view depicts the approach at an abstract, to facilitate and initiate your quest -


Following are some of the distinct advantages rendered by XRX architecture -
  • A web development architecture with a 10x productivity improvement over traditional Javascript/Object Oriented/RDBMS methods
  • A development architecture based on international standards that is designed to minimize the probability of vendor-locking
  • An architecture that gives a rich user experience without creating mountains of spaghetti procedural code on either the client or the server
  • A system that leverages the REST architecture to take advantage of high-performance and simple interfaces using web standards
  • Portability on both the client and the server using a variety of forms players and XQuery databases
  • The option of avoiding costly shredding (and reconstitution) of complex XML documents into RDBMS tables
  • A community of standards/tools and a "complete solution" ecosystem that can give you a proven ROI on IT investment
The forthcoming post in this series, i.e. "Part II : XRX Architecture - Illustrated", will walk you through an elaborate tutorial on building a sample application that implements the XRX architecture.

Adieu for now...

Monday, August 4, 2008

NLBean - Make your database understand English

Natural language processing (a.k.a. NLP) is a stream of artificial intelligence and computational linguistics. In theory, it is the most attractive method of human-computer interaction; but as natural language recognition seems to require extensive knowledge about the outside world and the ability to manipulate it, implementing Natural Language Processing has infact been one of the most sought after conundrums in the computing world. This article presents an abstract introduction to natural language processing and further discusses implementing the same to query databases.

What is a Natural Language Processing (NLP)…?

Natural language processing is the collection of techniques employed to enable the computers to understand the languages spoken by humans. The concept linguistic analysis and processing originated with efforts in the United States in the 1950s, wherein the intent was to use computers to automatically translate texts from foreign languages into English. Since computers had proven their ability to do arithmetic much faster and more accurately than humans, it was thought to be only a short matter of time before computers demonstrated the remarkable capacity to process human spoken languages. When computer based translation failed to yield accurate translations even after recurring efforts, automated processing of human languages was concluded to be far more complex than originally assumed. Hereafter natural language processing was recognized as a new field of study, devoted to developing algorithms and software for intelligently processing language data. Over the past 50 years, the field of natural language processing has advanced considerably and several algorithms have been developed, which process language grammar and syntax.

What is Natural Language Database Query (NLDQ)…?

Thinking a little innovative around the implementations of natural language processing, one can imagine a plethora of its applications, including a natural language processor to query databases. Natural language database query (NLDQ) is a subset of natural language processing (NLP) that deals with natural language inquiries against structured databases. The quintessence of natural language database querying (NLDQ) is to transform natural language requests into SQL or some other database query language, which could be further used to perform extractions from standard databases. As of today, there are quite some implementations which transform regular English sentences into well-formed queries. Following are some of the viable options in this segment – Commercial
  • Semantra
  • ELF English Query
Educational
  • Nchiql - a Chinese natural language database querying system
  • TELL-ME - a VAX/VMS based prototype natural language database querying system
Another workable option and one of my favorite open source projects in the arena of natural language database querying (NLDQ) is NLBean. Although the code is very much crude and experimental, yet it does work fairly well. The implementation could be extended, customized to identify varied organizational domain terms and used to render an easy to use interface for our business users who struggle to understand standard database query languages. The following screenshot depicts the standard interface rendered by NLBean v5.0 –

(Click on the image to zoom)

References
  • Download the latest version of NLBeans here.
  • Further details on NLBeans can be found here.