Over the last decade, there has been a steady increase in the volume, variety, and velocity of data and formats that are in scope for ediscovery matters. When ediscovery first hit the legal stage, a big case might be 100 GB — stored at a cost of $2,500/GB — and data sources might include Word, Excel, email, and possibly network shares. Though data volumes have increased over the last 15 years, the main data sources remained fairly constant — even the initial focus on cell phone data was limited primarily to call logs, voicemail, and text messages. Today, however, a big case might contain terabytes of data, in dozens of different formats, all of which must be sifted through to find key pieces of evidence.
As our digital ecosystem has so drastically shifted, legal practitioners have had to rapidly adapt how and where they look for key evidence. So what exactly are the new data sources driving this change?
Once squarely in the domain of Hollywood’s international men (and women) of mystery, businesses and individuals now have access to a whole host of applications that disappear or automatically encrypt upon reading. Ephemeral messaging, as the name implies, is a form of digital communication that lasts a very short time. Generally, the genre consists of applications for short-form communication on mobile devices that disappear from the recipient's screen after the message has been viewed. Platforms in this category include Snapchat, Telegram, Signal, and Wickr — even more mainstream applications like Gmail are also adopting ephemeral capabilities. These applications require timely data preservation and collection early in the ediscovery process.
Few things are as emblematic of the shift to work-from-home as Zoom video conferencing technology. Zoom, for instance, went from 10 million daily active meeting attendees in Q4 2019 to over 300 million at the beginning of Q2 2020, and the number of active minutes leapt from 100 billion to 2 trillion in the first quarter of 2020 alone. Zoom was not alone in this surge either — similar platforms like Microsoft Teams, Skype, GoToMeeting, and Webex all saw massive increases in usage and adoption.
Video conferencing tools add the complexity of audio and video as well as real-time chat functionality to the mix. Teams, Slack, and Zoom chats do not have the same boundaries as a traditional document. Parties can jump in or out of a conversation at any time, topics may extend across multiple channels and varied time periods and attachments, emojis, videos, or GIFs may all be relevant. Practitioners should ensure their technology partner can support the variety of content formats contained in these platforms.
Short-form communication and real-time collaboration are rapidly replacing email, and the rapid onset of the work from home revolution in response to the lockdowns has only accelerated this. The clear collaboration leader, Slack, has been adopted by more than 43% of Fortune 500 companies and boasts over 10 million active users per day, and this growth shows no signs of abating. The informal, rapid-fire, short-format nature of Slack communication makes it a treasure trove for identifying potentially highly relevant data in ediscovery.
Collaboration tools like Slack pose unique challenges practitioners should be aware of. Data exported directly from Slack and other tools is nearly undecipherable in its raw format, a file format called JSON. Extracting all the relevant information from a multi-person stream with links, reactions, graphics, and shared files and then presenting a cohesive picture of the data is complicated.
The IoT is an ecosystem of over 30 billion web-enabled things (from smartwatches to doorbell cameras to smart refrigerators) that are constantly performing tasks, collecting data, and most in some cases, sharing it. Each of these devices possess a wealth of information about human behavior, movement and many could contain information dispositive to a case or investigation. There is also a sub category known as internet of bodies that includes wearable, implantable and digestible web enabled devices. There are a whole host of discovery and privacy concerns around this data type.
Social media use extends far beyond bragging about your most recent meal or vacation and sharing videos of a lawyer that “is not a cat.” Platforms like Reddit and Twitter are essential vehicles for people to share breaking news; Instagram and Facebook are advertising behemoths; and LinkedIn has become crucial for sharing business insights. The social giants have some new competition these days with platforms like TikTok and international messaging apps like WeChat and WhatsApp with user bases in the billions. The variety of ways people engage with social media platforms and their messaging functions creates a wealth of potentially relevant electronically stored information (ESI) that ediscovery practitioners increasingly rely on.
Every minute, over 500 hours of user- and enterprise-generated content is uploaded to YouTube and the volume is only increasing. With the ever-present video functionality on smartphones and increasing clarity of the video they produce, YouTube, Vimeo, and a myriad of other video sharing and social media platforms are also now rife with real-time livestreams and videos documenting everything from the mundane to the catastrophic. At two billion monthly active users and counting, this video sharing giant has a wealth of potentially relevant information in criminal cases, war crimes, copyright infringement, defamation, and more.
Mobile devices are hardly a new kid on the block when it comes to inclusion in ESI requests and creating discovery headaches. In fact, even my earliest cases involved BlackBerrys, Nokias, and Palm Pilots! When the smartphone hit the stage, the nuance and complexity around mobile data in the ediscovery process was substantially amplified because the devices themselves went from a discrete and somewhat self-contained data source to a huge repository of potentially relevant information. Key information related to business transactions, product development, and nefarious behavior is often found buried in mobile data, and the work-from-home revolution of the last several months has only amplified this.
Cloudy with a chance of ESI
And finally, even more traditional data types like email and spreadsheets may reside in cloud-enabled repositories via O365, Dropbox, or Google suite. The physical location of these more traditional data types raises implications for data privacy and ability to collect the data. Practitioners should determine whether their data source is on premise or hosted in the cloud early in the ESI scoping process.
Stay afloat in the data deluge
The volume, variety, and velocity of data potentially relevant to a case is simply exploding. Practitioners should ensure they are asking the right questions to uncover evidence across the full data verse and not simply relying on the tried and true approach of email and docs. Relying on the tools, methods, and people that got us here is not enough to navigate a rapidly evolving world where self-destructing text messages, emojis, GIFs, and Slack messages contain the smoking gun.