Most knowledge managers say that KM is 80% process/culture and 20% technology. I agree and usually focus on the 80%. One of the interesting 20% issues is the appropriate role for and expectations of full-text retrieval systems in KM. (This issue also applies to managing documents in discovery.)

Broadly speaking, search tools fall into two classes: simple and advanced. By simple, I mean software that allows Boolean and proximity searches, which means using “connectors” such as AND, OR, NOT, NEAR, WITHIN, etc. By advanced, I mean software that finds related words (and therefore documents that do not contain the search terms), distinguishes among related meanings of individual words, and applies advanced methods to rank how relevant documents are. The latter use pattern-matching techniques, neural networks, state space vector analysis, and other approaches.

Although I have worked with full-text software for over a dozen years, I have two lingering question:
1. What is the incremental value of sophisticated search over simple search and
2. How much upfront investment is required to get the sophisticated search to provide that incremental benefit. The “upfront investment” includes cost of software, set-up/integration, user training, and perhaps most important, the need (in some systems) to build taxonomies or provide training documents that are already categorized.

Answering this question, which in my opinion is an empirical, not theoretical matter, is expensive. Ideally, one would create test data sets containing large collections of documents, each of which was well known to a few individuals. Then you would run different search engines against each, letting the knowledgeable people “drive.” Ideally, a statistician would help set up the test and measure the results.

Some law firms have tested some advanced engines and they tell me that they have been under whelmed. And at a recent trade show, the rep for a fancy search said that his product usually does not work that much better than plain Boolean and his company no longer pushes the search feature and instead focuses on other features. Sobering.

All this having been said, I do believe that there is probably value in using sophisticated search tools. It depends on the nature of the collection and the level of training of the folks doing searches. Over a decade ago at Wilmer Cutler & Pickering, we developed one of the first integrated scan-OCR-full/text-structured/db systems. We found that, in the right hands, using a sophisticated search tool was better than a simple one. “In the right hands” was key though without knowledge and/or training, the advanced engine was not that useful. Given some recent reports show that most users of most search engines don’t do more than one or two word searches, it may be the power of advanced engines needs more support than we think.

What we need as a profession is a mechanism to perform real-world tests, both on how the search tools perform under the most favorable conditions and how they work when actual users operate them. Unfortunately, this is costly and the incentives and structures to do so just do not exist.