Am I a member?
Browse the member listing...

Concept-Based Search Engines: Tools for Records Management

Undoubtedly, keyword search has had a huge impact on litigation support, but it has considerable limitations on the precision and recall attained.  As litigators oversee document coding projects and manual reviews for issues, relevance and privilege, recall must be high (few relevant documents missed) with little sacrifice to precision (few irrelevant documents collected).

One problem with keyword search of unstructured data is that adding keywords to improve recall yields an unavoidable sacrifice of precision.  No one particularly wants a litigation support search to work like a Web search, with relevant documents buried among thousands of irrelevant entries.

Enter Concept-Based Search Engines
A growing number of tools give lawyers the ability to search collections, not only by the usual Boolean logic but also by concept.  Even without a well-developed list of search terms, these tools yield a quick “first look” into new collections.  They facilitate research into unfamiliar client or opposing party collections in ways not previously available, and they are being used to improve precision and recall in litigation support and records management.

Why Use Concept-Based Search Engines?
Concept-based search engines can be used to help classify information.  They employ rules, “learn” and adjust weighting based on user input.  Like litigation support, records management requires a highly accurate system of classification.  While no concept-based search engine can automate all classification tasks, a growing number of tools employing concept-based search engines are making the job of records classification much easier.

There are differences in the strategies and algorithms employed by concept- or context-based search engines.  Virtually all yield documents that would not be found with traditional keyword searches, and they all probably succeed with greater recall. 

What About Precision?
Context-based search engines rely on algorithms based on neural networks, complex rules, manual tagging (indexing) or immense, specialized thesauri.  Some rely on weighting, while others can weigh but do not require it.  And some require “teaching” by manual correction to automated decisions. 

The more promising tools are based on Bayesian and Shannon’s Principles algorithms that enable identification of patterns naturally occurring in text, based on the usage and frequency of terms that correspond to specific concepts.  The engine creates an inference of a probability that a particular document is about a specific subject based on the preponderance of one pattern over another in a piece of unstructured information.  The inference about subject matter assists the classification required for records management.

Law firms and law departments fortunate enough to have access to these tools for litigation support or knowledge management may be the first in our industry to take advantage of these tools for classification of records.  Others are sure to follow.

About our author

Chuck Kellner is a Senior Consultant at Daticon Inc. In addition to working with the sales and technology groups there, he assists clients in developing their strategies for electronic discovery and electronic records retention.  He can be contacted at ckellner@daticon.com.

From: 
Email:  
To: 
Email:  
Subject: 
Message: