Newly developed software may prove promising to address the problems of duplicate and near-duplicate documents in discovery. 

Why are duplicates a problem? First, lawyers may inconsistently tag duplicates or near dups, meaning some versions are designated responsive and some non-responsive, some privileged and some not-privileged. This can lead to confusion, credibility issues, and possibly even sanctions, especially in portfolio litigation where multiple cases turn on the same set of documents. And second, numerous duplicates can significantly increase the time – and therefore cost – for reviewing documents.

Existing approaches to detecting duplicates have limitations. One approach is to use a “hash,” a mathematical technique. This approach determines only if documents are completely identical; a single difference in one character or the file path makes two documents different. Another approach is to use meta-data to detect possible duplicates.

Software start-up Equivio has software that, upon first evaluation, allows litigators to identify near duplicates and adjust what is meant by “near.” For example, drafts of the same document prepared by different authors on different days with different file names could be identified as potential duplicates. (Hashes and meta-data cannot do this.) Such differences may be relevant to the case, but often they are not. Clustering near duplicates and reviewing them simultaneously can be a great advantage in helping to insure consistent responsiveness and privilege designations and in saving review time.

This is yet another example of software that can help address the new challenges created by e-discovery. It seems increasingly clear that over the next few years, automation using highly sophisticated semantic techniques will ease the burdens of reviewing and managing digital discovery documents.