Please enjoy this blog post tip co-authored by Matt Maslow, Senior Manager, Deloitte Canada, and Peter Sanford, Director, Deloitte Canada.
Those of us of a certain age grew up expecting that by now we would have the Jetsons' robot maid, Rosie, simplifying our lives. Yet so far, all we have is the Roomba vacuum cleaner.
— Juan Enriquez
Well Mr. Enriquez, we are here to respectfully disagree.
In the context of eDiscovery, the advent of Artificial Intelligence has changed the way that legal teams approach document review, by teaching the machine what we are looking for and what we can deprioritize. In the simplest terms, Machine Learning takes each coding decision that a reviewer makes and tries to apply it to the rest of the population — sorting documents into likely relevant and not relevant populations. In this context, Machine Learning is supervised learning, meaning that human input is required to teach the machine. Like Netflix, Machine Learning examines your choices in the background, and applies those choices to a library of information, predicting what you want to look at next.
But if deployed correctly, it can do much more — creating efficiencies beyond the traditional, linear approach to document review. In fact, machine learning can be our Rosie, tidying up as we go and making sure the house is in order.
Let’s discuss some other ways that Machine Learning can increase productivity, give your team insight into document populations, and save your clients some money.
Machine Learning as a QA tool — After human coding is done, Machine Learning ranks documents (both coded and uncoded) based on the likelihood of relevancy. This is perfect for staging quality control (“QC”) of your linear review.
At the beginning of a review, the machine doesn’t know anything about your documents. As you teach Machine Learning, it begins to separate the relevant and not relevant documents into two, largely discrete, populations, separated by a valley in the middle. The image below represents a nearly ideal end state, where we have clearly defined relevant (on the right) and not relevant (on the left) populations:
At some point, in almost all linear reviews, coding needs to be validated through QC. On larger reviews this has traditionally been done by sampling. The issue with this approach is that samples are (i) almost by definition random; (ii) targeted based on what we already know about the matter, and therefore likely to confirm our biases about the documents; and/or (iii) time consuming to set up. By targeting documents for QC with Machine Learning, we can address all of these problems.
First, imagine a scenario where someone coded a document not relevant, but Machine Learning has determined it is very likely to be relevant (or vice versa). Someone — either human or machine — is wrong. By targeting this discrepancy, we can prioritize QC by selecting the highest value documents. This creates an incredible efficiency — this QC can determine what’s been overlooked during review and direct your limited resources to the likeliest misses by the review team. Similarly, there are probably not going to be many overturns where human reviewers have judged something as relevant and the machine agrees. There’s probably little value in revisiting these documents in most matters.
CAL to fill in Knowledge Gaps — Second, for time immemorial we have selected our review population using search terms based on what we know about our matter at the earliest stages. Documents without search terms are set aside and addressed piecemeal as we learn more about our matter, or new search terms are crafted and we repeat the process. But Machine Learning provides opportunity here as well.
A Machine Learning model including documents without search term hits can be built to run in the background. As the main review progresses, keep an eye on the documents that Machine Learning thinks are relevant. If they all contain the existing search terms, wonderful! The initial search terms have located everything. More often than not, however, this process will uncover new terms that are highly relevant, but weren’t captured by the original terms. As an example, the search term “robot” wouldn’t have turned up “android”, but these are conceptually linked and we would expect machine learning to group them together. This leads to additional documents that require review. The real value here is that your reviewers don’t need to constantly track new ideas and terms that arise — Machine Learning is doing that for you.
Machine Learning on New Custodians or Other Parties’ Documents — We’ve all been there. You completed your review, packed your bags, booked your tickets, put on your sunscreen, when suddenly you receive an email stating that there are more documents to review… lots of them! Luckily, Machine Learning allows us to use the existing document coding to target files that come to us late in the game. This includes, for example, an opposing party’s production. After identifying duplicates of what you have already reviewed, there is likely to be net new material. These documents can easily be assessed with the existing Machine Learning, which will identify what’s likely to be relevant.
Often we find this is even more effective when we build a new model with a subset of key documents that contain information we are hoping to find in the opposing party’s production. By narrowing the conceptual focus of the new model, we can efficiently leverage our previous review work. The key here is upfront organization — you want to set up your database so that all the noise is filtered out and you’re left with a discrete reviewable set of potentially interesting documents.
Great. So What? Like Rosie the robot, legal teams are all being asked to do more with less, to find efficiencies, and continually improve our productivity. While assisted review is certainly not an “Easy Button” that will allow us to lounge on the couch while the robots “cook and clean” our review set, applying the technology thoughtfully will allow you and your practice to get a clearer picture of your case. We advocate using assisted review as a backstop for your review process, assessing new data, reviewing other party documents, and filling in the blanks on your review set.
If you have a matter where you think the above use cases may be the right fit, consider hiring your own Rosie to help you work smarter and faster.