Knowledge Management - includes Industry Participants

 View Only

From Lake Houses to Literacy: Making Sense of the New Data Landscape

By Jordan Galvin posted 09-27-2023 10:28


I remember the first time I read the definition of Data Management: “the development, execution, and supervision of plans, policies, programs, and practices that deliver, control, protect, and enhance the value of data and information assets throughout their life cycles,” Wow—that’s heavy. With such an overstuffed definition, it isn’t surprising many folks are either intimidated or bored by the mention of data management.
Simply, data management is the recognition that data is an asset. If I asked you to think about all the things that allow your organization to exist and grow in market share, social leadership, and profitability, a few things would probably spring to mind: people, money, brand, knowledge, even software and equipment. In short, you would think of your organization’s assets. And I am willing to bet, regardless of the business you are in, your organization invests significantly in maintaining and growing these assets once they’ve been acquired.

Take a common example: finances. There are reasons businesses don’t keep half of their money in a bank and half under a mattress: not only would this present a lot of risk, but it would also foreclose the opportunity to optimize the organization’s financial position. With a sub-optimal financial position, the organization could not pursue all the things it cares about. 
Data is another of an organization’s most valuable assets. But it is a unique asset. It’s not fungible. It’s not tangible. And a single point of data carries no value on its own. But, like traditional assets, if data is mismanaged, it can be a huge liability. On the other hand, if it is governed, it can lead to tremendous value–but only if it accessible, reliable, and secure. And that is the goal of data management: to govern data as we would our other assets so that we can use it to advance our strategic objectives.
In this article, I will introduce you to both the technical and non-technical components of data management. I will define key parts of a great Data Governance program; demystify common Data Engineering phrases; and touch on important topics related to data consumption. I hope this will help you engage more confidently in discussions about data at your organization.

First Stop: Data Governance
Data Governance is the center and scaffolding of any successful data program. The person or team in this role builds robust, scalable policies and procedures and an engaged community of stakeholders which define the organization’s interactions with its data. Through these activities, Data Governance drives the organization’s most valued objectives by managing the quality, availability, usability, and understandability of its strategic and critical data. In this section, I am going to define what I like to call “The 3 Ss of Data Governance.” Just remember: every piece of important data needs a standard, a steward, and a system of record.

Standard: The key is to introduce uniformity to your critical data, whether that is an enterprise-wide definition, format, or use case. Reducing variability in this way increases the usability of the data while lowering the costs of maintenance.

Steward: A data steward is the person responsible for a piece of data on a day-to-day basis. They help the Data Governance team define the standards and system of record for the data, and they ensure usage of the data adheres to these standards going forward.

System of Record: Also commonly referred to as “source of truth,” a system of record is a place where data is stored that is designated as housing the most accurate version of specified data. As an example, an organization might designate its HRIS system as the system of record for all its people data. Thus, if my name is spelled one way in the finance system, if it doesn’t match how my name is spelled in the HRIS system, the finance system has spelled my name wrong.

Next Stop: Data Engineering
As data has gotten bigger and broader over the past twenty years, a new engineering discipline has emerged to handle the unique challenges and opportunities storing, transforming, and moving data presents. Data Engineers build and connect the technologies that deliver data when, where, and how it is needed in the organization. In this section, I will cut through the jargon and define the important concepts you need to know. 
Warehouses, Lakes, and Lake Houses: Data warehouses, data lakes, and data lake houses are all storage technologies that fall under a class of database systems called online analytical processing (OLAP). That is, each of these are a way to store data in a way that it is optimized for aggregation and analysis. This differentiates warehouses, etc. from online transactional processing (OLTP) which is optimized for storing large amounts of transactions (e.g., calls to a call center). 
Using our terminology from above, OLTP systems are typically your systems of record, and they feed the OLAPs (warehouses, etc.) that your data engineers build. Analysts can then pull from the OLAPs and save a lot of time and effort in data wrangling. Whether your data engineer builds a data warehouse, lake, or lake house will depend largely on the type of data existing in your underlying Systems of Record.

Data warehouses are the oldest of these architectures and are well suited for structured data (data that is typically housed in rows and columns). Given that most of the work done by the legal industry is structured, data warehouses are still a highly effective mode of storing data for analytics for law firms and departments. However, semi- and unstructured data (e.g., text, videos, images, documents) are quickly proliferating–even in legal. Data lakes were developed in response to this need for greater storage flexibility, as they can handle structured, semi-structured, and unstructured data. Finally, data lake houses were developed recently to combine the best features of both warehouses and lakes, while adding governance enforcement features.

Regardless of the architecture you choose, the goal of each is to provide analytical data to business intelligence, data science, and AI platforms.

Final Stop: Consumption
The data governance and engineering activities outlined above are all in service of the ultimate consumption of data by the business. While you may be familiar with longstanding or headline-making ways we utilize data (e.g., dashboards or generative AI, respectively), data management enables consumption in myriad other ways. To end this article, I will focus on just two terms you may have heard but still aren’t quite sure what they mean. 
Metadata: In its most reductive sense, metadata is data about data. In context, metadata are those all-important features of a piece of data that tell us how we can use it, where we can find it, and how much we can trust it. Metadata could be definition, an effective date, or a system of record for a piece of data. When managed and shared with the organization, metadata becomes both a powerful governance tool and an enabler of enterprise analytics, data science, and AI.

Data Literacy: This term describes the ability to understand and communicate with data. As businesses and their employees generate and consume ever increasing amounts of data, data literacy has become as important today as computer literacy was 20 years ago. Helpful data literacy topics include: making sound inferences from data, interpreting charts and graphs, and spotting data quality issues. 
Data is one of the most important assets of this century, not just to organizations, but to individuals as well. We all deserve to be a part of the conversation around data. I hope this article helped you see beyond the buzzwords to the holistic, and exciting, discipline of data management, and prepared to you take part in what’s next.