Advanced metadata and data analytics are two topics with which one can easily clear a room — or capture the attention of a CEO interested in leveraging data for better results across the organization. A first step is to get a working understanding of how the two concepts can interconnect, and to find some extremely practical applications of their combined capabilities.
Call me an optimist, but I really believe we’re on the verge of a new innovation cycle when it comes to metadata and data analytics. I feel that the last decade’s advances in these two disciplines have set the stage for truly game-changing developments in the near future.
Searching for the perfect analogy
As someone formally trained in microbiology, I tend to look to nature for analogies to explain the metadata and data governance services my company provides. One of my favorite analogies is that of data as proteins or molecules, coursing through the corporate body and sustaining its interrelated functions. This analogy has a special relevance to the topic of using metadata to detect data leakage and minimize information risk — but more about that in a minute.
To continue with my analogy, the pioneering work that the company Ayasdi is doing in data analytics has opened up a new level of insight. Ayasdi is using topological mathematics to create multi-dimensional models of data. They currently offer one of the most advanced analytical tools on market, allowing them to take a data set, and convert it to a unique, multi-dimensional shape to quickly uncover correlations that aren’t immediately obvious using standard statistical methods. This is a key step in being able to conduct advanced data analytics and optimize insight discovery. This insight into data occurs at what one might call the atomic level, which in turn can allow data scientists to aggregate separate data elements into successively larger “structures” for increasingly complex analysis.
The next paradigm
The truly exciting news is that this concept is ripe for being developed to enable an even deeper type of data analytics. By taking the ‘Shape of Data’ concept and applying to a single character of data, and then capturing that shape as metadata, one could gain the ability to analyze data at an atomic level, revealing a new and unexplored frontier. Doing so could bring advanced predictive analytics to cybersecurity, data valuation, and counter- and anti-terrorism efforts — but I see this area of data analytics as having enormous implications in other areas as well.
A recent article on CNBC quotes projections that 2016 will bring the biggest cybersecurity attacks that we’ve seen. One key driver is that with a 30% increase in the number of connected devices we collectively have, we are generating more sensitive data than ever before. The same article points out that the techniques we’ve used in the past to protect data assets will not work against the new hacking techniques that are emerging. We firmly believe that old techniques of both detection and prevention will not cut it, and that to succeed against cyber attacks of all kinds, we need to leverage advanced, proactive and predictive analytics.
A good time to innovate
One of the interesting points to note is how important timing is to this capability. We generally recommend creating approximately 170 metatags for every data element in an organization; if an organization has 1,000 distinct business data elements, that works out to a large, but still manageable, number of metatags. Once we take the metadata down to a data character level, however, that will increase the amount of metatags exponentially. What’s more, depending on how often an organization wants to monitor for things like predicted breaches, it could introduce yet another complex variable into metadata — that of time. Fortunately, advances in the field of big data have come along at just the right time to make this huge leap in the sheer size of the data easily manageable.
Returning to my earlier analogy, we’re taking the ability to analyze data down to an atomic or genomic level, thus gaining the ability to more effectively manage its entire ‘data genomic structure.’ But we can also roll up the individual data elements into successively larger structures — creating the data equivalents of molecules, macromolecules, simple organisms, complex organisms … and ultimately entire ecosystems.
Needless to say, it’s important to not get swept up in hyperbole. Still, I truly believe that a breakthrough analogous to Watson and Crick’s identification of the DNA structure may be imminent in the field of data analytics. All of which makes metadata and data analytics highly relevant topics for anyone interested in data and data-centric organizations.