In my July 30th posting, Thoughts on Full Text Retrieval (a KM and litigation support topic), I discussed lingering questions I have about the value of advanced full-text retrieval. My questions notwithstanding, I do believe that more sophisticated tools offer value, at least when used appropriately. And of course, as tools grow in sophistication, the answer may change.

The current issue of eWeek magazine (8/11/03), in the cover story IBM Takes Search to New Heights, describes new search technology IBM is developing. It appears that the new software, dubbed Unstructured Information Management Architecture (UIMA), combines multiple approaches to searching, including statistical algorithms, rule-based reasoning, symbolic reasoning, and artificial intelligence. An IBM spokesperson is quoted as saying the software understands text and tells you what’s in it.

An IEEE publication provides a bit more information: “The Combination Hypothesis states that using a variety of techniques – such as natural language processing, statistical analysis, and syntactical and grammatical rules-based intelligence – together may result in significant data analysis improvements.” More detailed information is available in an IBM white paper on Architecting Knowledge Middleware, presented in May 2002 at a conference.

IBM has developed many interesting technologies so, at a minimum, this initiative bears watching. Some readers may remember the early days of Optical Character Recognition (OCR). There were “voting engine” systems that used multiple brands/approaches to perform OCR on the same documents and let the the majority result rule. Perhaps this analogy is overly simplistic, but it sounds conceptually similar. The key of course, is how the voting algorithms work (and I could not find detail on that).

Law firm technology managers interested in KM and searching document stores should stay tuned for more information on this promising approach. A conceptual breakthrough in search – or even an incremental step forward – could be important.