For CEOs of data-intense organizations, one of the most vexing challenges is the “dual mandate” involving how they use and protect their data.
On the one hand, there are the many possibilities of using big data and data intelligence to gain transformative insights into markets, organizational performance and other critical issues. The company with the better data and data analysis should be able to make better decisions, and thus outmaneuver the competition (at least in theory).
On the other hand, there is the intense but understandable fear that hackers will get through one’s firewall and steal data or create other mischief. The negative impact of such a breach is terrifying — including everything from a public relations disaster, to a plummeting market valuation, to customers running for the door.
On the horns of a dilemma
Generally, organizations follow one of two approaches to solving this conundrum.
Some focus on only one of the two goals. When they do, they tend to err of the side of following database security best practices to limit and control access, and it’s easy to understand why. However, once you’ve locked down, encrypted and otherwise protected your data, you may have a new problem. By making your databases so secure, all that data you worked so hard to collect is now being used by only a small handful of the potential users in your organization. In fact, the database you originally intended for bigger things now feels more and more like a sunk cost.
Other organizations might recognize that both data access and data security are important goals, and decide to launch twin initiatives to address them. However, when the focus is separated in this way, the inherent conflicting and competing interests can start becoming even more problematic.
Whichever approach you take, you may end up irritating some legitimate data users in your organization by limiting their access to data — so much so that they look for other data sources. It’s not unheard of for data users in an organization to even go so far as to create their own Hadoop cluster with the data they need. That might work in the short term, but it too can lead to bigger problems. You now have two sets of data describing the same reality — one of which happens to be very structured and protected, and the other one, not at all.
The better solution: a window into data access
Personally, I don’t really advocate or talk about adhering to any particular database security best practices. One can always do a Google search for best practices for any of the popular databases, and find many recommendations.
I think a more appropriate path starts by recognizing that the two problems are actually directly related. A better way to address the dual mandate, I believe, is to implement an operational model underneath both of them: the Office of Data. Once you have that in place, you can start to truly manage your supply chain of data, viewing and managing your data through all three lenses — physical, logical, and conceptual.
In short, you create a metadata solution that wraps your physical data in relevant metadata. You can then create a data intelligence layer on top of that, allowing you to start doing machine learning and advanced analytics on the metadata now captured. In fact, this approach can provide you with the ability to introduce a form of artificial intelligence that can recognize when your data is most likely being misused — and respond appropriately.
This solution also exposes the business glossary part of your meta repository to your whole company, and gives users a lever they can use to dial up access and availability. When data scientists want to consume data, they can click a button, gain access to the exactly the kind of data they’re looking for in the physical database, and start using it.
Each time this happens, the organization has a record of who’s consuming what data. The solution also allows you to enforce your data governance plan, by having your team follow up with an email to each user, asking for their contextual use case — that is, why they’re consuming that particular data. Over time, these insights will allow you to develop a detailed picture of who the legitimate consumers of your data are, and how they consume it.
The big takeaway: start leveraging your metadata
If you’ve read some of my other blog posts, you already know how essential I believe metadata is in protecting data throughout an organization. The alternative approach to database security best practices that I’ve described here is another good example. It can help you provide your data scientists with maximum access to your valuable data, and at the same time, create an automated database sentry, on the lookout for inappropriate use 24/7/365.