This is a two-part, joint blog post. I recently spent some time looking at Xerox’s new CategoriX EDD tool and writing a post about it. After reading it, I realized it would be helpful to set my discussion in a broader context. So I turned to my friend and e-discovery expert Tom O’Connor and author of the docNative Paradigm Blog . What follows is a combined post; we wrote each section individually and are cross posting this. 

Xerox CategoriX and Musings on the Best Approach to EDD Search
by Ron Friedmann

In early October, Xerox Litigation Services released a new e-discovery search and review tool called CategoriX. How should EDD professionals think about this and other new search technologies?

A Xerox PR firm offered me phone time with the CategoriX product manager, Svetlana Godjevac. Always curious about new litigation document review tools, I accepted. I also read the CategoriX product sheet and a statistics-heavy Categorix white paper explaining how Xerox tested the product.

The CategoriX approach sounds interesting and useful. Xerox R&D in Grenoble developed the product and the company appears offers it beyond the litigation market (see the page
Text Categorization and Clustering housed under Xerox Technology and Brand Licensing.) The product combines ‘probabilistic latent semantic analysis’ (document clustering) with iterative machine learning.

It sounds powerful but I can’t evaluate its effectiveness. This is by no means a criticism. Both search approaches have been around for years so it’s hard for me to assess how they work in CategoriX. Learning more about Caterorix confirms what I’ve suggested before: mere mortals can no longer evaluate EDD platforms, at least not by assessing the underlying algorithms.

I lament that I don’t know enough statistics to fully comprehend the white paper but Xerox appears to have tested the product (though the nature of the 2 document sets studied and human reviewer groups is not described). One finding I did focus on is that Xerox used this tool to quantify inter-reviewer variability. Not surprisingly, humans are not all that consistent, a fact that lawyers routinely overlook. In my conversation, Ms. Godjevac reports that Xerox does explain the statistics to lawyers and works with them to understand the problems of human review.

How a litigation team should choose among the available advanced tools is a real quandary. The investment to run a “bake off” among competing choices is enormous; moreover, the outcome may well depend on the nature of the documents. What does this say about defensibility in general? Would it be defensible to use product A if an objective study showed that product B was 20% better? And what exactly does 20% better mean anyway?

Courts seem a long way off from considering this question but the leap from the current standard to one that requires comparing tools seems more a matter of degree than of kind. Are litigation support professionals obliged constantly to evaluate new tools to make sure what they now use is adequate?

Of course, I may be way off base here. Which is why I am surprised and dismayed that I haven’t found much commentary on this tool. Many other bloggers comment on EDD but I did not find much blogging (or Tweeting) about CategoriX. I would like to see more discussion of products, comparisons of them, and the future standard of what courts will rule is defensible.

[I felt this did not stop at quite the right spot so am glad Tom stepped in….]

The Challenges of Evaluating EDD Search Tools
by Tom O’Connor

Ron, your comments about the problems facing anyone attempting to evaluate ED applications are right on target. First of course is the fact that one needs an engineering degree to even read some of the white papers in this field. But it seems to me that the problem starts even before that with several fundamental problems.

The first, as you mention, is that there is never enough detail given about the document sets being studied. Understanding the documents is a crucial part of any automated litigation process and evaluating products which don’t sufficiently describe the universe of documents they are working with is simply impossible. This is not a failing of Xerox alone but really all the reviews I have seen. It is nearly impossible to cross compare applications if they are “tested’ on widely divergent data sets.

In addition, some search engines use a standardized thesaurus such as the publicly available WordNet Lexical database, an open source thesaurus from Princeton University. It has over 100,000 English words and associations. As an open source resource, the WordNet database is available for download and examination if needed for litigation validation purposes. If, however, the comparison is between one program using this database and another one that uses an internal or closed database, does that really help us?

Even the widely touted TREC (Text Retrieval Conference) study suffers from this failing in my opinion. The TREC study used a test set of 7 million documents available to the public pursuant to a Master Settlement Agreement between tobacco companies and several state attorneys general. Attorneys assisting in the study drafted five test complaints and 43 sample document requests (referred to as topics). The topic creator and a TREC coordinator then took on the roles of the requesting and responding counsel and negotiated over the form of a Boolean search to be run for each document request.

The problem is those documents were not in native format and did not include attachments. Given that typical collections today consist largely of massive volumes of e-mail, many with attachments (and attachments to attachments), this is , a huge issue when evaluating search capability for email.

A second problem I see concerns what type of search is best. We all agree that computer searching is more accurate than human review. The Sedona Conference Commentary on the Use of Search and Information Retrieval Methods in E-Discovery, released in August 2007, states that “Human review of documents in discovery is expensive, time consuming, and error-prone. There is growing consensus that the application of linguistic and mathematic-based content analysis, embodied in new forms of search and retrieval technologies, tools, techniques and process in support of the review function can effectively reduce litigation cost, time, and error rates.” So the assumption that concept search is better than Boolean searching, although widespread, may be wrong.

In Disability Rights Council of Greater Wash. v. Wash. Metro. Area Transit Auth., 2007 WL 1585452 (D.D.C. June 1, 2007) Federal Judge Facciola stated that “concept searching, as opposed to keyword searching, is more efficient and more likely to produce the most comprehensive results.” Judge Grimm made a similar statement in Victor Stanley, Inc. v. Creative Pipe, Inc (Civil Action No. MJG-06- 2662 (D. Md. May 29, 2008).

The TREC study results, however, don’t seem to support these judicial positions. In that study, computer scientists from academia and other institutions attempted to locate responsive documents for a number of topics using 31 different automated search methodologies, including concept searching. The result? Boolean searches located 57 percent of the known relevant documents. None of the alternative search methodologies had better results.

In fact, a Boolean search generally equaled or outperformed any of the individual alternative search methods, but those alternative searches also captured at least some responsive documents that the Boolean search had actually missed. The lesson? Manual review misses many documents but so does keyword searching, Boolean searching and concept searching – but they all miss different documents. The best approach is to use multiple applications to do iterative searches which winnow down to the best possible results.

This isn’t late breaking news. Ron, you started a discussion in May 2008 in Concept Searching in E-Discovery. Some of the info above I gleaned from reports on web sites and reports by people like Herb Roitblatt or Gene Eames who know a whole heck of lot more about this than do I. But the point is, one product isn’t going to do the job, no matter how good the product or convoluted their documentation. And irrespective of the tool, the “operator” better be well trained or who knows what the results will be.

I share Ron’s concern about emerging standards of defensibility. Given the technical complexities and the lack of statistical certainty, I don’t see how a clear, stable defensibility standard will emerge other than what we’ve seen, namely, have a plan, apply some smarts, and document what you do. As we’ve seen in other arenas, developing standards by judicial opinions is a long and messy process. Well, I suppose the upside is that consultants will stay busy!