Data lineage may seem like an obscure topic, but it is actually central to the value your organization derives from its data. Data lineage is nothing less than the story of how data gets into your organization, who uses it, and how it is transformed.
Having an up-to-date understanding of your data lineage helps your organization in a variety of ways and dimensions.
The first area of impact is existential — because data is vital to your organization’s ability to survive. Data is the fuel that keeps the individual functions within your organization working, whether it’s your CEO looking for data about sales growth, or your marketing department looking for demographic and buying behavior data to set sales forecasts, or any other function. Without the data, each of these functions would become irrelevant. So it makes sense to develop a very clear understanding of where your data is coming from, who uses it, and how they transform it.
A second, more technical reason is that the specific sources of data can have major implications. For example, every time one of your IT teams begins a new software development process, one of their first steps is to gather requirements. Among other things, they need to know what data sources they will be using. If you’re like many organizations, it may be very difficult for your team to locate the best source of data. In such cases, the only viable alternative may be to create new data — requiring additional time and/or expenses that may not have been part of the original plan. But if you have your data lineage established and documented beforehand, such delays won’t happen.
The third reason why data lineage matters is that if you’re like most organizations, your data changes from year to year. One way it might change is that you’ve begun collecting entirely new types of data, either in the form of customer or product data you haven’t previously collected, or in data that you’ve purchased from external sources. Another possibility is that your internal data analysts have developed a way to derive new insights from existing data. This innovation could provide new insights for management, or even a new revenue stream, but in either case it’s also a new data element that must be managed. Gaining insight into data lineage helps you keep up-to-date with the changing data environment on which your organization depends.
In our experience, the best way to gather the metadata needed to establish data lineage is to conduct collaborative, face-to-face workshops with the individuals in your organization who are best positioned to shed light on the source and use of the data. Be wary of claims that all you need to do is buy a new metadata or data lineage tool. With most data that resides in legacy systems, as well as data created in certain newer technologies, the only way to determine lineage is to extract the information from the minds of people within the organization — something no product can do.
Most of all, data lineage matters because data is in a sense a living thing; it has a lifecycle that encompasses its creation, use, and modification by various users. By understanding this dynamic and ensuring that the people who need the insights have access to them, your organization can gain far greater value from the data it owns.