Predictive coding is set to revolutionise large-scale discovery

By 09/04/2014Discovery
predictive coding

A quiet revolution is occurring in the e-discovery space called predictive coding. In the United States there is growing judicial support for this new method of discovery, which promises to substantially reduce the scale of manual discovery exercises and hence to dramatically reduce the cost of discovery.


  • Predictive coding is a technology assisted review process involving the use of a machine learning algorithm to distinguish relevant from non-relevant documents, based on coding by a “subject matter expert” of a “training set” of documents. [fn1]
  • A “subject matter expert” is a person (typically but not necessarily a lawyer) who is familiar with the information being sought and can render an authoritative determination as to whether a document is relevant or not.[fn2]
  • Predictive coding works by a “training set” of documents being coded by one or more subject matter experts as relevant or non-relevant, from which a machine learning algorithm then infers how to distinguish between relevant and non-relevant documents beyond those in the training set.[fn3]
  • Documents can be coded for relevance as required eg for specific issues and privilege.
  • The process is repeated and refined until the output is measured as being statistically reliable.

Predictive coding avoids the need for human review of significant proportions of large document collections that are deemed non-relevant, leading to substantial cost savings. The process itself is not inexpensive but becomes more cost-effective with greater volumes of documents.

In the United States there is a small but growing number of authorities concerning predictive coding.[fn4]

One of the latest decisions is Federal Housing Finance Agency v HSBC North America Holdings Inc & Ors in which US District Court Judge Denise Cote denied a motion to re-open and challenge the completeness of the plaintiff’s predictive coding discovery (ordered earlier in the proceeding).  In doing so the Judge noted a seminal article by Grossman and Cormack that indicated that:

predictive coding had a better track record in the production of responsive documents than human review, but that both processes fell well short of identifying for production all of the documents the parties in litigation might wish to see”. [fn5]

However, speaking generally Judge Cote observed that:

no one could or should expect perfection from [a large scale discovery] process. All that can be legitimately expected is a good faith, diligent commitment to produce all responsive documents uncovered when following the protocols to which the parties have agreed, or which a court has ordered.”

It appears that it will only be a matter of time before predictive coding and other technology assisted discovery review processes enter the vocabulary of judicial decision-making in Australia.

Disputes will inevitably arise, as they have in the United States, over the negotiation of e-discovery protocols and the completeness of other parties’ e-discovery undertaken using increasingly sophisticated techniques such as predictive coding.

Litigators who themselves possess expertise in e-discovery will be in demand for such disputes, in addition to the negotiation, management and implementation of appropriate e-discovery processes.



1. Maura R. Grossman and Gordon V. Cormack, ‘The Grossman-Cormack Glossary of Technology-Assisted Review’ (2013) Federal Courts Law Review 7(1), 26.

2. Ibid 31.

3. Ibid 32-33.

4. See, e.g., EORHB, Inc. v. HOA Holdings, Civ. Ac. No. 7409-VCL (Del. Ch. Oct. 19, 2012); Kleen Prods. LLC v. Packaging Corp., Civ. No. 10C 5711, 2012 WL 4498465 at *84-85 (N.D. Ill. Sept. 28, 2012); In re Actos (Pioglitazone) Prods. Liab. Litig., MDL No. 6:11-md-2299 (W.D. La. July 27, 2012); Global Aerospace Inc. v. Landow Aviation, L.P., No. CL 61040 (Va. Cir. Ct. Apr. 23, 2012); Moore v. Publicis Groupe & MSL Group, No. 11 Civ. 1279 (ACL) (AJP), 2012 WL 607412 (S.D.N.Y. Feb. 24, 2012).

5. Maura R. Grossman and Gordon V. Cormack, ‘Technology-Assisted Review in E-Discovery can be more Effective and More Efficient than Exhaustive Manual Review‘, (2011) Richmond Journal of Law and Technology XVII (3). This article is often cited in support of the view that technology assisted review processes are more accurate and efficient than exhaustive manual review.