• X

    Latest News

    Business Process Data Lineage

    Most modern businesses have complex processes. These processes can be difficult to map out and to track since the work of the individual can be practically invisible. Instead of working on an old assembly line, attaching a single part to a model-T over and over, in our era of ‘knowledge workers’, employees are creating relationships with potential customers, executing financial transactions and creating revenue forecasts. Each of these jobs adds value to the business, but its visible footprint can be minimal. At the same time, these processes can be complex. They can involve drawing data together from across several business systems located around the world and these processes can involve intricate timing sequences. As a result, business processes are difficult to visualize and hard to grasp, thus difficult to manage and provide the proper KPIs and other metrics.

    Fortunately, data can tell the story if you look at it in the right way. The flow of data through an organization reveals how people, process and technology work together. With comprehensive metadata (data about data), you can track exactly what is occurring at each step in the business process. You can see where data is created, touched or lost in order to locate bottlenecks and other process inefficiencies. The problem is that most organizations don’t have comprehensive metadata. Worse yet, they think that they do. Usually, what they have is the metadata from machines – the computers, databases and other IT machinery that hold data. While this metadata is vital, it’s value is only maximized when it is combined with metadata about how the humans think of, manage and utilize data, and – this is crucial – how, in detail, that data supports the business model. To get a complete picture of how data lives and works in your organization, you need to add what we call conceptual metadata. We have found that most data problems are the direct result of business processes, and not technical issues. Acquiring conceptual metadata can be tricky, but with the right method, you can leverage comprehensive metadata in order to visualize your business processes as never before. This gives you powerful insights that can improve your bottom line.

    Foundational Concepts

    Before turning to the process itself, we need to set forth a few fundamental concepts:

    Metadata is data about data. If we consider a library book to consist primarily of the story or the collection of information inside the book covers, the metadata is the information on the cover and the first page or two that tells us about the book: the title, author, publication date, printing press, etc. These metatags allow librarians to systematically catalog the book and for library patrons to search for and find the information they are looking for.

    There are three primary types of metadata: Physical, logical and conceptual. Physical metadata is the data about when and where a data value resides within a data store, and the log produced by technical code as it touches, updates, moves and eliminates data values. Logical metadata is concerned with the schematics that map out how the data systems are organized, and in theory, how data should move through the system. Physical and logical metadata can be extracted out of machines or diagrams with relative ease by a competent technician or data architect. The third category of metadata is conceptual. This type of metadata consists of what data creators, consumers and analysts think of the data. Their knowledge reveals an important aspect of how data actually lives and works in the data ecosystem, and more importantly, within the business model.

    Conceptual metadata is the key to making improvements in data governance plans, metadata management, data quality, advanced analytics or data science, and other aspects of enterprise data management because it reveals not how data could or should work, but how it actually does work. Conceptual Metadata documents how data flows (or does not flow) through the organization. It shows the level of awareness of available data and their sources and it indicates how reliable the data is from the business perspective. In short, conceptual metadata is integral to treating data as an asset and is the key for maximizing the value of data.


    The method for implementing Business Process Data Lineage involves the following four steps:

    1. Build around your existing Data supply chain and metadata solution.
    2. Acquire Conceptual Metadata connected to business processes.
    3. Visualize and analyze Business lineage.
    4. Take action for continuous improvement.

    Step 1 Build around your existing Data supply chain and metadata solution.

    Before implementing this solution, we will assume you have a functional Data Supply Chain up and running. The Data Supply chain is the collection of systems, people and processes that move data through your organization from initial creation to storage to end-use. The complexity of the Data Supply Chain will vary considerably depending on the size of the organization. You will need a metadata solution in place that collects physical and logical metadata. Since you will be collecting conceptual metadata, you will need to store it in an existing central metadata repository or in a parallel metadata repository connected to the data ecosystem. Ideally, all the metadata would be housed together, but since many of the existing solutions do not provide adequate functionality for conceptual metadata, you can append a system that will.

    Step 2 Acquire Conceptual Metadata connected to business processes.

    To acquire Conceptual metadata, you will need a method to effectively work with the most complex and quirky technology in most organizations – the human mind! The essential steps in the process are to identify stakeholders and to engage them in a fast-paced collaborative process that results in useful, agreed upon metadata about what data elements are created, changed or consumed within a given business process. It is essential to do this quickly so as not to lose momentum, or to bog down the process with tangential quarrels or interminable email chains. We have pioneered a method called Controlled Chaos that leverages a refined process and real-time communication tools to quickly extract vital conceptual metadata.

    When done correctly, this process results not only in a valuable collection of conceptual metadata that describes the relationship of data elements to business processes, but also in the identification of the most likely data stewards and custodians for a given data element. The social aspect of the process effectively nominates and endorses data experts based on their working knowledge, and thus establishes the personnel structure for an effective data governance plan.

    For this particular use case, you will need to capture the conceptual metadata tied directly to business processes. Specifically, you will need to capture the metatags pertaining to Process Step(s), Sequence of Step(s), and the Relationship of Data Elements to Process Step(s). If you collect these conceptual metatags, you will have the ingredients to construct a clear picture of how data is produced, where its dependencies are and where it supports business processes.

    Step 3 Visualize and analyze Business lineage.

    When you treat the individual steps in a business process as a form of conceptual metadata, you can create a dynamic model that creates ongoing, real-time feedback. Effective mapping of this model allows you to see the impact of different types of data, and it makes it easier for you to continuously re-evaluate and refresh your actual business processes. These potent visualizations help you to define, understand, and explore the relationships between your data, your people, and your technology – they vividly illustrate not only how data is an asset, but how data is the glue that keeps people, process and technology in synchrony.

    These illustrations show, in detail, how a given data element (such as ‘Customer Zip Code’ or ‘Marketing Channel’) are connected to a given business process.

    In the first visual, enumerated Business Processes are listed on the left side. On the right are individual data elements. The connecting lines indicate the set of data elements that are touched (created, changed or even just used) in order to accomplish the particular Business Process. For instance, the process of #6 Positioning, draws from a large swatch of data elements ranging from ‘Prospect-Age’ at the top, to ‘Salesperson’ toward the bottom. In contrast, #5 Lead Qualification draws from a relatively narrow range of data elements.

    Business Process Data Lineage - Visualization #1

    Business Process Data Lineage – Visualization #1

    In this next visual, relationship lines are drawn only for process steps that create or change business data elements. (Unlike in the visual above, if a process simply uses a data element, no relationship line is drawn.)

    Business Process Data Lineage - Visualization #2

    Business Process Data Lineage – Visualization #2

    Step 4 Take action for continuous improvement.

    This visualization provides the ability to focus on constraints that the data is causing on the business process. In this simple example, we see a classic data governance challenge in that there is a data element called ‘Customer Status’ that is populated or changed in multiple business processes (specifically, #7 Account Admin, #6 Positioning and #8 Account-Retention). On the surface, this may seem like a benign finding; however, through the years, we’ve seen that this pattern is quite problematic if each of these process steps have different organizational stakeholders that interpret status differently and thus, use it for conflicting purposes.

    Having a mechanism to visualize this abstract problem is a very powerful mechanism for diagnosing conflicts and driving real improvement to your organization’s most valuable data assets. Adding data stores or applications to these visualizations (see below) allows for the identification of other conflicts and for expedient remediation.

    Business Process Data Lineage - Visualization #3

    Business Process Data Lineage – Visualization #3

    With the added layer of applications and data stores, one can now begin to visualize the organization’s entire Data Supply Chain. By filtering the relationships to isolate ‘Created By’, Changed By’ and ‘Consumed By’ relationships, we can view the supply side and demand sides of data and the processes and technologies that interact with them.

    This thought process can be extended to apply to other types of macro level organizational variables such as people, process, technology, and data. The result is an unprecedented tool for organizational analysis that is incredibly powerful. Using metadata to model organizational constraints or “choke points” can provide immediate optimization opportunities such as throughput increases, cost reductions, improved nimbleness, and more.

    With some additional integration to a business glossary and data quality solution, it is possible to visualize individual community member accountabilities, meaning & purpose and data quality across the business data elements, business process and data stores. The ability to put this all together in an integrated dynamic analytical exploration is an electrifying capability with which to drive improvements to data and to reap the value of data.


    The method of collecting and visualizing metadata outlined above demonstrates the link between data and the business model. Like business itself, the relationship between data and the business model is always changing and evolving. A stagnant formula will not account for these dynamics.

    Our method requires information architecture that will house and operationalize vital metadata – the ‘data about data’ – that fuels our analysis. Some of the most important metadata for this process resides not in computers and servers, but in the heads of the people who create and consume the data. We have developed Controlled Chaos, a highly structured, repeatable and reliable method for pulling this conceptual metadata out of users’ heads. Because it requires as much art as science, the extraction of conceptual metadata is often overlooked, even though provides very rich metadata.

    After the right metadata has been collected, advanced visualization techniques show precisely which data elements relate to each data element. These interactive ‘maps’ allow you to focus on individual or groups of data elements and they give you the power to see how lines of responsibility for data can get tangled. Finally, we suggest that the power of this method to analyze and visually connect data to people to processes can be applied across the organization to continually improve process efficiency and management.

    For a more comprehensive explanation of how to harness Business Process Data Lineage, click here.

    Submit a Comment

    Your email address will not be published. Required fields are marked *