I work with client’s everyday in designing, building & delivering machines to improve the state of data. Almost every day, regardless if they are “business sponsored” or “technology sponsored”, it is always the same pattern. The business case is developed as to “why we should be investing in data”, then technology teams step up, ask for the check to be handed over and then directly begin managing the “investments” like SDLC IT projects. The hard reality is that if data truly is an asset, there is no logical reason as to why you should ever be managing it like an SDLC project — dropping in an MDM hub is not the solution to fixing data redundancy, there is no such thing as a silver bullet. I’m certainly not picking on MDM, because in the grand scheme of the data lifecycle MDM hubs are absolutely a required aspect of the architecture. To get to all of the promise of an MDM hub; however, there are other aspects of the data ecosystem that MUST be established!
Data needs to be inventoried, just like an accountant would inventory or account for cash assets. Using that example, a CFO always knows the balance of assets by class & category. This is also true for data as well. There should be an inventory of unique Business Data Elements & KPIs (key performance indicators) that are linked or related to their physical versions (i.e. columns, tables, ETL, repositories, etc.). Below is a simple graphic that depicts a data cluster (a grouping of physical data elements to business data element).
[Insert a graphic]
Having this view, and only having this view, enables you to make better investment decisions towards improving the state of data. Using our CFO comparison, a key metric that he or she would be accountable for is Debt-to-Asset ratio, which is calculated by aggregating all short and long-term debts, divided by total assets (both tangible and intangible). This is accomplished by having the complete inventory & value for all three of those variables. There is a very similar opportunity with data. By having all business data elements & physical versions inventoried, you can then use an automated Data Certification algorithm that aggregates both the amount of metadata associated to each cluster of business data element to physical data elements, as well as the overall standard deviations between data quality outcomes to control limits — a total statistical aggregation & evaluation of all things important around data. If you are not calculating Data Certification, you could also use a simple data quality metric by data cluster as the value to aggregate.
To continue our example, we (Data Clairvoyance) have created a Data Certification aggregation algorithm that has a scale of 0-6 and a limit that says any data cluster (the package of Business Data Element tied to it’s physical versions) that scores less than three is considered to be ‘Not certified’, and conversely more than three is ‘Certified’.
Data clusters that are ‘Non certified’ are not being managed well; therefore, have a lot of cost, burden & risk associated with them. Cohorts or clusters that are Certified, are being effectively managed & governed; therefore, are more valuable to the organization. If you was to take the total number of data cohorts that are Non Certified divided by the total Certified, you get a similar ‘Debt-to-Asset’ ratio.
[Insert image that compares Debt to asset ratio to Non-Certified to Certified Data ratio]
The reason this comparison is so valuable — and applicable — is that debt-to-asset ratio is a KPI that can gauge an organization’s burden on cash-flow. As this ratio increases, the monthly service fees towards that debt becomes higher. If the service fees reach a threshold that they represent a large portion of the monthly net income, the organization’s cash-flow becomes constrained. Cash-flow is critical in almost every type of business — those that have it, have competitive advantage.
The Non-Certified to Certified Data ratio provides a similar indication of burden. As this ratio increases, the organizational burden that it takes to simply manage, control, and maintain data (or in some cases hide the skeletons in data) increases. Much like financials, if an organization can’t do the management, control & maintenance stuff efficiently, it leads to extreme material risk by way of redundancy, unintended use, decisions made on bad data and serious cost structure issues — storage, people costs & litigation. The final, and probably most potent, using our comparison, organization’s that are heavy debt-to-asset ratio and as a result cash-flow constrained have a very difficult time investing into the future — seeking competitive advantage. This is also true with Non-Certified to Certified Data ratio, in that the organization with a high ratio can’t leverage data in all of the new & awesome capabilities that are becoming reachable. In the 90’s a team of analysts to mine through data, using statistics to look for patterns, was really only reserved for the elite firms & certain government agencies. Today you can hire a team of Data Scientists that are trained and versed on most big data platforms, and eager to begin building machine learning algorithms to LEARN more about the revenue streams, customers, product buying patterns, arbitrage opportunities, competitive intelligence and the list goes on. If your organization is being bogged down with data, then you can’t FULLY leverage these types of resources, teams and capabilities. To FULLY leverage these types of resources and capabilities, as much trusted & valuable data as conceivably possible should be pumped into these teams — if you are too busy managing, controlling and maintaining you aren’t processing & pushing data into the groups that can turn it into dollars.
[Create a unique call to action — push to 1 time phone consulting sessions to discuss how to employ Data Certification & Data-as-an-Asset KPIs]