For most businesses, the goal of data collection is not simply having a lot of data, but rather using that data to build a predictive model that will guide future performance. Trends in data, sets of data, and the relationships between sets of data must be analyzed and interpreted in a logical, readable format. Traditionally this work has been done manually by a team of people that include business SMEs, data stewards, architects & technologists.
The use of topological data analysis helps us to detect categories and relationships within vast quantities of information in a more nimble & dynamic way. At its core, topological data analysis deals with the analytical problems created by the typical representation of data as a point in an n-dimensional space. For example, if we were to draw a two-dimensional graph with “frequency of push notifications” on the X axis and “frequency of user response to push notifications” on the Y axis, we would wind up with a number of points arranged in a curve shape in a two-dimensional space. Although this graph is easy to understand, it is unlikely to paint a complete picture of the relationship between the X and Y variables. A truly accurate picture of that relationship might include several variables–from timing of push notifications to the color of the icon. Because of the number of variables at play, the actual “shape” of the push notification response data is likely to exist along several axes which may not have been discovered yet. This is where topological data analysis comes in.
The goal is to have a computer ‘look’ at our simple graph of push notification data and ‘see’ the more complex structure which it represents, much like the human brain can look at a bunch of apparently random points of color and see a scene of a picnic at a park. Much like we use our brains to recognize a group of similar dots as forming the shape of an umbrella, topological data analysts use a set of advanced algorithms to discover a complete picture of all the groups and relationships within a vast quantity of data. We value data not for its mere existence, but for its meaning. However, graphical representations have been historically limited in their ability to accurately represent the meaning of large quantities of data. In recent years, advances in topological data analysis have given us the ability to interpret and use data more accurately and effectively than ever before.
This new tool in our arsenal can enable us to do things in data management & data governance that has historically been a monumental challenge. Now with data storage being so cheap and available, we can readily look at all data values, metadata & even derive our own metadata. With this information we can construct a visualization for the shape and relationships within our organizational data – a faster & better approach to exploring the physical & logical aspects of data.
At Data Clairvoyance, we are passionate about the scientific advances that put data to work for people and businesses; please feel free to contact us and learn more about our work.