Around the time the Apollo astronauts were first heading for the moon in the late 1960s, Intel co-founder Gordon Moore made a prediction that would later become famous as Moore’s Law: that the number of transistors in integrated circuits would double every year (he later modified it to every two years). Incredible when it was first made, the prediction proved essentially accurate for several decades.
Today there’s a similar phenomenon occurring in the volume of data that individuals and companies create, manage and use. In fact, we now generate, collect and use data at far greater rates than ever before. For individuals, data accumulation and use has evolved rapidly from desktop computers to laptops and handhelds, and now to smart phones and Fitbits. The coming wave of other “smart devices” — the Internet of Things — promises to bring even more data under our control. In an information economy, we’re all being forced to become better stewards of both our own and our organizations’ data.
The same trend is happening for organizations, but on a far greater scale. With a dizzying array of new data sources and pipelines, and costs dropping for data storage, businesses and other organizations are gathering unprecedented volumes of data. Financial, marketing, and customer data are just the tip of the iceberg … and new categories, sources and uses of data emerge daily. There are no signs that this trend is slowing down any time soon.
Questions and uncertainties abound
Fine, you may well say — but how does all this data help my company? The answer depends on a number of factors. For example, are you using your data safely — that is, protecting sensitive data from prying eyes, malware and other threats?
Another very complicated question is: How reliable is your data? Consider the case of an insurer whose business units maintain siloed information about policyholders. Something as simple as discrepancies in how fields are named across the enterprise can make getting an accurate tally of total policyholders next to impossible. In many cases, an organization’s data records may represent enormous liabilities, at least until such discrepancies and overlaps are resolved.
An additional complicating factor about data is that organizations increasingly need ways to bring together structured data (that is, any data stored in a traditional database with a column or field name, such as first name, last name, etc.) and unstructured data (basically, everything else). This is both liberating and challenging. For example, consider an organization that wants to incorporate its sales and marketing data with related Twitter activity to assess the impact of a regional marketing campaign. Tweets are great examples of unstructured data, because they can contain anything — common English language words as well as abbreviations, slang, URLs, hashtags and more. To effectively make sense of and analyze such an amalgam of data requires an advanced machine learning solution.
Yet another challenge is that not all data is created equal: some is indeed an asset, while other data resources are liabilities and still others are essentially worthless. Trying to separate data into these categories can help an organization deploy its resources wisely — but also requires a significant investment of time and resources.
Additional complicating factors
If all that isn’t complicated enough, consider some of the other demands and opportunities that are reshaping the data environment. To begin with, there’s the ever-increasing threat of data breaches, and the desperate need to stay a few steps ahead of the bad guys.
On a more positive note, some organizations have a growing awareness that they may be sitting on gold mines of data — the raw commodities they’ll need to succeed in the information economy. But first, they must strip out any PII and other sensitive data and run analyses on what’s left to make the data truly their own. And then they need to create ways to efficiently “package” the resulting data, either for use by internal customers, or potentially for sale to external customers.
Clearly, the world of data — and how organizations (and individuals) create, use and store it — is evolving rapidly, and at a pace that will only accelerate in years to come.
Fortunately, several disciplines are coming into their own to help manage the risk and optimize the gains from this complex thicket of 1s and 0s — including innovations in data quality, metadata management, machine learning, and others. As a result, organizations can use their ever-growing stockpiles of data to significantly improve operations, make better business forecasts, gain competitive market advantage, and even create new revenue streams in the information economy.