A new e-discovery study by Equivio statistically compares human versus software performance in designating responsive documents. The results are worth reading. 

I’ve argued that litigators and judges should rely on statistical analysis to determine the the most reliable and accurate approach to reviewing documents. In my March 2007 blog post, The Gold Standard for E-Discovery Document Review, I argued that lawyers’ belief in the accuracy of human review is likely misplaced. In that post, I described an empirical study H5 conducted.

The Equivio>Relevance (TM) study is available by registration. It compares human review results to Equivio>Relevance (TM) results and found that

“Out of the 4,107 documents in which the review analyses differed, the Oracle [“Topic Authority” in TREC lingo, aka “subject matter expert”] reviewed a statistically drawn sample of 190 documents, or slightly less than 5% of the documents in dispute. Of those 190 documents, the Oracle determined that Equivio>Relevance was correct in 147 of the cases. The human review team, by comparison, prevailed in only 43 of the disputed sample documents.”

In my view, that means the computer performed better than humans. Equivio is, however, rightly more cautious and draws this conclusion:

“computer-assisted review can dramatically increase the efficiency and accuracy of a document review team’s work”.

Instead, the study suggest litigators can use the tool for earlier and case understanding and more consistency, among other benefits. The study also notes that “Equivio>Relevance ultimately identified more than 1,000 additional responsive documents that had been overlooked or mis-categorized by the human review team.” (The H5 study written up in my ‘Gold Standard’ post also reported that software found more relevant docs than the humans.)

I think about the Equivio analysis by way of analogy. Let’s say the corpus of documents were a bunch of cells in your body. And let’s say the responsive docs represented cancerous cells. Further, the more cancerous cells you can find and segregate, the higher your chances of survival. If you, as the patient, had to choose the diagnostic test you wanted, which one would you chose? I’d go with the computer because it’s finding more cancer cells. And if the FDA were evaluating whether to approve human search or computer search, the statistics would strongly favor the latter.

Of course, we are lawyers and not doctors. And lawyers must appear before judges and defend their discovery methods. And we know judges don’t like “black box” approaches to discovery. That however, may have to change. We can learn from medical clinical trials and how the FDA approves new drugs or devices. It’s all about statistical outcomes (technically, the “mechanism of action” for a treatment does not have to be understood to obtain approval).

Anyone care to do the stats to compare smart keyword searching against concept searching?