Live from Legal Tech NYC, a session on empirical research on e-discovery, specifically the reliability and value of using computers to review document. 

The session: “The Electronic Discovery Institute is a 501(c)(3) non-profit corporation dedicated to resolving the legal community’s electronic discovery challenges. The Institute’s study compares the time, cost and accuracy of traditional, manual document review processes with computer assisted categorization tools.”

The panelists:
The Honorable David Waxse, Federal Magistrate Judge District of Kansas
Craig Ball,Esq. Attorney & Computer Forensic Examiner
Julia Brickell, Esq., Associate General Counsel, Altria
Peter Gronvall, Esq., Managing Director, AdamsGrayson
Anne Kershaw, Esq., EDI President & founder of A.Kershaw PC/Attorneys & Consultants
Laura Kibbe, Esq., Senior Corporate Counsel & Managing Director, Pfizer, Inc.
Jonathan Nystrom, EDI Study Participant & Vice President, Cataphora
Patrick Oot, Esq., EDI Vice President & Director of Electronic Discovery, Senior Counsel, Verizon [MODERATOR]
Herb Roitblat, Ph.D., EDI Chairman & Principal, OrcaTec LLC
Rich Tobey, CPA, EDI Study Participant & Managing Partner,Vmax Consulting

Oot opens by pointing out that the real goal in e-discovery is justice. Rule 1 of the FRCP references securing “the just, speedy, and inexpensive determination of every action and proceeding.”

Start with the notion that assessing relevancy is difficult. Oot references his involvement in Verizon acquisition of MCI. They used traditional 2nd request review process with much manual review. 83 custodians, 2.3 million documents, 2 law firms involved with one deploying 115 lawyers and the 2nd deploying 110 lawyers to conduct privilege and relevance review. It took four months of long days. The cost of document review was just shy of $13.5 million. Note that this matter was not big by today’s standards. FTC would not allow the parties to use key word searches to narrow the document review. “There’s got to a better way to do this than all the human review.”

Oot and Kershaw started the eDiscovery Institute (EDI) to study if there is a better way to conduct document review. Kershaw now summarizes the Institute: The idea started a few years ago with a private review Kershaw did comparing two approaches to document review. Judges and others wanted more data to compare approaches. Work today has just scratched the surface – much remains to be done. Institute is a not-for-profit and is set up to do additional studies to ease the pain of conducting litigation.

EDI’s first study compared traditional doc review with an electronically assisted process. EDI will publish a white paper in early 2008; it will be peer-reviewed and available freely. Views EDI as unique organization to provide factual information (Sedonna focuses on princicples). Pfizer and Verizon are current sponsors but EDI seeks additional sponsors. EDI will not be a vendor or process certification organization – it will report on factual findings.

– Should a party consider alternative methods to brute force review?
– Is computer assisted relevancy assessment reasonable under the Rules?
– Is any process reasonable?

The study dataset: The MCI-Verizon acquisiton data for antitrust 2nd request – 83 custodians in 10 states, 13. terabytes, over 2 million documents.

Roitblatt describes study: Quantitative measurement is key. References the seminal Blair-Moran 1985 study that found that researchers are only 20% accurate in finding docs but thought they were 80% accurate. The way to measure accuracy is to measure actual performance against the “the truth.” You have to approximate the truth. [Editor: in medicine, this might be called the gold standdard.] How do you define the “baseline” of the objectively or widely accepted definition of relevance of each document. Must consider both false positives and false negatives. Precision is percent of docs selected that are truly relevant. Recall is percent of relevant docs actually retrieved. Elusion is percent of docs not retrieved that are relevant.

Key question is what we can actually measure? What are the appropriate “power tools” for e-discovery (versus manual review)? To answer, start by looking at ESI review process: training, case background, examples combined with experience lead to judgments of whether a document is responsive or not. In a 2nd tier review, typically reviewers only look at what first round designated as responsive. So two tier review has problem that relevance calls on first round are not necessarily carefully reviewed.

How does a computer get experience to separate responsive from non-responsive docs? It’s all just mathematics. The competition among vendors is who has the better math. The process with computers is based on rules, text, and math applied to docs. Computer approach may “recurse,” that is, adjust its process based on feedback from human reviewers. For study, “true” designation of document is based on original work of MCI-Verizon team.

Roitblatt describes the famous Turing test for artificial intelligence: can a human tell the difference between a computer and a human in a text interface, interactive conversation. By extension, a computer aided review should be comparable to a human review.

PROVISIONAL RESULTS OF STUDY: 4 computer systems agreement with original attorney review ranged from 72% to 88%. (For this comparison, the original review is considered as the “truth.”) Note that in human reviews, where there are multiple human reviewers, rate of agreement among the humans is typically lower.

Question to Judge: what happens when issues of best method come to the court. The Judge says it’s better for the parties to collaborate to come to a shared view on this topic. Disagreement should be aired at 16b conference. Legal system requires reasonableness, not precision. Plus it requires reasonable cost. [Editor: this begs the question of how precise is precise enough to be reasonable.]

Brickell: it is no longer reasonable to presume humans should review all the documents.
Craig Ball: If parties cooperate, they can agree on a reasonable method. Cautions that studies show that human review, by some measures, are only 40% accurate. Should computers be designed to make the same errors that humans make? Compares issues here to Google ranking, where links, which are made by humans (at least in theory), are a form of group voting.
Kershaw: Many discovery requests presume way too much is relevant. These studies may help narrow scope of what we generally consider as relevant. Low accuracy of human review reflects inconsistent judgment and fatigue.

Panel discussion continues at 1140am but other Legal Tech events beckon…

[Editor’s note: In Thoughts on Full Text Retrieval (a KM and litigation support topic) (July 2003), I noted that “What we need as a profession is a mechanism to perform real-world tests, both on how the search tools perform under the most favorable conditions and how they work when actual users operate them. Unfortunately, this is costly and the incentives and structures to do so just do not exist.” It’s great to see that this is finally beginning to happen.]