It’s natural to assume that information risk is primarily about hackers trying to get into an organization to steal data. That’s certainly a key part of information risk, but there are other less obvious aspects that are equally threatening. These blind spots are sometimes referred to as the risk of the “known unknown.” What are some examples?
Let’s start with the bad guys. Hackers have been around almost as long as organizations have stored data. But they now they have a new tool — advanced persistent malware (APM) — that is causing great concern among CEOs and CIOs. Essentially, it’s a piece of code that can be uploaded into an organization’s technical ecosystem, where it can do damage in a variety of ways. For example, it could begin to replicate in various places throughout the enterprise. Or it could wait an indeterminate amount of time before it starts to either destroy, encrypt or steal data. Or it could simply serve as a “foot in the door” that will allow hackers to get in later.
Many organizations are highly vulnerable to this threat, because they typically have huge quantities of data in the form of complex code base at the foundation of every application. APM can find its way into this basic code, and then sit dormant and unnoticed until its programming tells it to go into action.
That means that an organization doesn’t know when or where the threat will occur — or even if it will occur at all. The solution for finding and rooting APM out is equally challenging: the developers of the malware could change its profile at any time, so basic solutions that look for particular lines of code may only work for a short while, requiring new solutions to be created and deployed.
A second area of the known unknown involves your own people. Organizations may have dozens, hundreds or even thousands of people using their data on a daily basis, including data scientists, analysts, software developers, marketing professionals, finance professionals, customer service agents and others. Because all of these individuals are human and resourceful, they naturally find ways to make the organization’s data easier to use and more effective in meeting their needs. These uses could range from innocent interpretations and manipulation of data to support a particular goal, to actually changing data values without appropriate permission or against a process control.
A more common problem is when well-meaning people modify critical data values, even with the best intentions. For example, one department might decide to tweak a customer spreadsheet to collapse both date of birth and Social Security Number to appear in a single column, separated by a comma. To ensure this new combined data element is masked from prying eyes, they might try to obfuscate it by changing the column name to a less precise heading — such as “CUST INFO.” Despite their good intentions, important organizational data is now hidden away where it won’t be easy to find — and may not be completely obvious by simply looking at the data values as to what the column contains. This means that if the organization ever needs to account for all its references to customers’ Social Security Numbers, it may overlook this data set, and thus produce an incomplete report. It may sound like a trivial error, but keep in mind, this example involves just two business data elements among as many as 1,000 or more in a typical organization.
This is one of the ways employees’ behavior can lead to different views of the data and the truth — sometimes referred to as the phenomenon of reflexivity. It generally starts small, with different individuals using the data in slightly different ways, drawing their own inferences and calculations. Over time, these views can stray further and further from each other, until they’re quite different. But all the while, the CIO sees that data rules are being followed, and reports to the CEO that everything is fine. It’s essentially a “data bubble” that eventually will burst, creating a large amount of chaos and confusion until the discrepant views of reality can be reconciled.
Another element of the known unknown involves data lineage. Data lineage is the path that data takes as it moves through an organization’s technical and data architecture.
To understand why data lineage can be a significant known unknown, imagine your organization were a town that got all its drinking water from a nearby river. It would be very important to know who’s using the water upstream and how they treat it before putting it back into the river — as residents of Flint, MI may appreciate all too well. Even if your town measures water quality as you pump it out of the river, if you don’t know the water’s lineage, you can’t really know where or when contaminants could be getting into the water, or how to prevent them from coming downstream in the first place.
Data is similar as it flows through an organization. It has sources, consumers, and points of transformation (which could include both contamination and purification). That’s what data lineage is all about: understanding and mapping the data’s flow so that you have a better understanding of how to safeguard its quality and value. Unfortunately, in my experience working with organizations in a variety of industries, most companies don’t give this matter sufficient attention — and end up relying on little more than faith that their data is effectively managed during its lifecycle.
Seeing risk for what it is
The big lesson here is that your information risk is only partly caused by bad people. The element of risk that may be a bigger surprise is what tends to happen in the course of routine, day-to-day operations — both in how individuals use and manage data and also in the data itself as it moves through the organization. Understanding these areas of vulnerability is the first step an organization must take before it can identify, manage, and lessen those risks moving forward.