Handled correctly, metadata management can have an enormous positive impact on your organization. In fact, there are companies — Google comes immediately to mind — that have built their entire business models on the management and publication of metadata.
But even if you’re not the next Google, the discipline of metadata management can still deliver important benefits — such as reducing redundancies in data, helping management make more profitable decisions, and streamlining the process of finding specific data when it is needed.
Unfortunately, the reality is that most organizations will never benefit from metadata management, because only a fraction make the sustained effort needed to improve their data in the first place. Of the perhaps 40% that do, only a fraction of those tend to begin with a focus on metadata. And of that small sliver, almost all of them tend to assign it to an individual or small team in IT who approach it as a purely technical project. And in almost every case, such efforts end up producing little more than a confusing array of data that no one can really do anything with.
But if you have made the decision to dedicate the resources to effectively take on metadata management, here are the two essential steps in the process.
1. Create a meta-repository configured around the metadata you need.
Although there are some repository tools on the market that claim to help with this step, none are attuned to your business. Most are designed for technologists and architects, and tend to focus on one particular aspect of metadata — the underlying technology. Instead, you need to gather an array of metadata about each business data element (BDE) relevant to your organization in six categories:
Business Metadata. This is essential information about a given BDE, including the acronyms and synonyms used to describe it, and how it is used on a daily basis.
Technical & Core Metadata. We talk about these two types of metadata as a unit, because either one without the other is basically useless. The technical category is the only one enabled by the types of repository tools mentioned above. Most of these tools can go out to a database or mainframe and ingest metadata such as the structure of the database or file, as well as the platform — i.e., is it a mainframe or database, Oracle or SQL, and so on. This type of data consists of little more than a series of codes and numbers, and is unfortunately not readily interpretable by human beings.
This is why we believe that if you’re going to gather technical metadata, at the same time you should also acquire core metadata, which contains the administrative aspects of the technical information. To get this, you need to interview the database administrator overseeing each data element, and get all the terms they use to describe that database, including names, aliases, and any technical projects that were used to create or enrich it. In addition, you need to learn about the schema underlying the relationships. The key point to understand is that this metadata must be pulled from the minds of individuals who know the most about it, and then used to populate a series of metatags about the asset.
Data Quality Metadata. This is a series of metrics that can be derived by running a 3-step qualitative and quantitative process consisting of profiling, assessment, and creating rules. This process creates scores that tell you about the quality of the technical metadata.
People Metadata. Whether you’re analyzing a BDE, database, table or column, there are people in your organization who actually care passionately about that particular instance of the data. At the column or table level, it might be the database administrator; at the asset level where the BDE is being consumed, it might be a marketing analyst. Regardless, you need to capture the names, insights and knowledge of each of these individuals, and associate them with the appropriate BDEs in your metadata repository.
To gather this insight, you need to look for the individuals whom we refer to as believable experts, and who come in three categories. The first category includes data stewards on the business or operational side, as well as data custodians who manage the physical or technical assets. The second category includes the technical experts — the data architects, developers, and administrators. The third category includes the consumers of the data. The goal is to capture the names of the different people who care about each data element, in all its physical instances.
Search Metadata. Just like Google (though on a slightly smaller scale), you can use certain metatags to increase or decrease a search index, helping users to find certain types of data more easily. Search metadata can be developed based on the other five categories listed above, to reveal where users actually go to find the specific types of data they need.
2. Acquire the metadata.
Once you’re established your foundational capabilities, you need to implement the process of metadata acquisition. Because most of the categories of metadata listed above reside only in people’s minds, you need to develop processes for acquiring it from the people who work in different functions throughout your organization, with processes tailored to each function. This is not easy — and it’s where the art and science of metadata management comes into play. In fact, it’s a broad enough topic that I hope to address in more detail in the future.
Revisiting the point made earlier, to the extent an organization takes on metadata management at all, it will most likely end up acquiring only the technical metadata, and never really gather what it needs to know about the other five categories — and therefore, the results will be of little business value. This is why it’s highly recommended to engage with a firm that specializes in the metadata management discipline.