Litigation Support

Demystifying the Alphabet Soup of Legal AI

By Catherine Casey posted 29 days ago


What is legal AI?

There is quite a bit of confusion surrounding legal AI and frankly, AI in general. For the purposes of this article, let’s use the original definition proffered in 1956 by the man who coined the term AI, John McCarthy: "the science and engineering of making intelligent machines." A more elaborate definition characterizes AI as “a system’s ability to correctly interpret external data, to learn from such data, and to use those learnings to achieve specific goals and tasks through flexible adaptation.”[1]

Understanding the machine learning at play in legal AI

In the context of the practice of law and ediscovery in particular, the deployment of AI has generally fallen into the category of machine learning. As the name implies, machine learning is a subset of AI characterized by the use of algorithms and statistical methods to enable machines to improve or learn through experience. Legal AI has focused on supervised and semi-supervised types of machine learning with more recent iterations delving into reinforcement models.

Broadly speaking, most legal AI use cases today rely on algorithms trained with human input to identify similar or dissimilar categories of documents to reduce the amount of time it takes to surface key concepts or evidence. Understanding the capabilities and limitations of your specific legal AI is important in framing expectations and workflows to maximize its effectiveness.

Supervised machine learning

The first type of AI to hit the legal (ediscovery) stage was a version of supervised machine learning that was trademarked with the unfortunate name “Predictive Coding™”. (If you have somehow escaped Monique da Silva Moore, et al., v. Publicis Groupe SA & MSL Group (S.D.N.Y. Feb. 24, 2012, it is worth the read if only to compare to the more advanced approaches we have today).

Now commonly categorized as technology assisted review (TAR) 1.0, this version of machine learning entailed a small group of experts coding a subset of data to “train” an algorithm about what to select as relevant, nonrelevant, or related to certain issues. This was repeated for several “rounds” until the suggestions made by the algorithm met a certain statistical level of precision (statistical threshold of accuracy) and recall (statistical threshold of completeness). The algorithm would then run against the full data universe and the results would be pushed out to a larger group of reviewers who could confirm or reject the coding suggestions. Ultimately, allowing the algorithm to auto-code with certain levels of precision and recall were statistically validated.