Gartner defines “Dark Data” as the ‘information assets organisations collect, process and store during regular business activities, but generally fail to use for other purposes’. A lot of dark data is unstructured.
According to IDC, 90% of unstructured data is never analysed and this figure does not show any sign of waning, as the big data revolution churns up copious amounts of structured, unstructured and semi-structured data, which is collected and stored by organisations without any real purpose other than compliance. This can be an extremely costly and inefficient exercise for many corporations, while, in fact, they may actually be in possession of a vast untapped potential, which lies hidden within their unstructured data assets.
So, for forward-thinking Chief Data Officers (CDOs) and Chief Analytics Officers (CAOs) who deal with ‘dark data’ as part of their normal course of business, here are 7 tips to turn your unstructured data into a veritable ‘goldmine’.
1. Be relentless in acquiring executive buy-in.
One must recognise the potential treasure trove of insights which lay within an organisation’s unstructured data assets and communicate this importance back to the board. These could have real tangible benefits to decision-making, which could result in better business outcomes and enhance the experience of customers and colleagues alike.
Peter Laflin, Chief Data Scientist at Bloom Agency and one of the notable leaders in Data and Analytics space, used the analogy of ‘data as oil’ stating, “if I try and sell a barrel of oil to the general public, I won’t get very far. The general public need oil, or at least they need the by-product of the oil which is petrol, diesel, plastics etc, but they can’t consume the raw material. The oil and chemical industries convert the raw material into something usable – and as data scientists, we are converting unstructured data into something of value which others can use (and need to use) in their daily lives.”
2. Divide and Conquer.
CDOs and CAOs must define and implement stringent Data Governance and classification processes to thwart the accumulation of otherwise valuable unstructured data amassing in disparate and redundant databases. The results of a Veritas’s Databerg Report conducted across Europe in 2015, discovered that Dark Data accounts for 59% of data, whilst 29% data is considered obsolete, which leaves only 12% of identifiable Business Critical Data. Hence, this is clearly a major obstacle for senior data leaders.
3. Bolster your technological capabilities.
In order to exploit and maximise the use case for unstructured data, the data must be analysed and mined. Unstructured data is plagued with noise and most Business Intelligence platforms tend to work best with structured data. However, with the apparent explosion in unstructured data sources, vendors are becoming more proficient in handling unconventional data formats.
Although the real challenge appears to be aggregating data sources for analysis. Peter Laflin elaborates on this by stating, “Counting doesn’t “cut it” – simply counting words or the occurrences of an event isn’t enough – so we must build innovative methods for the extraction of pattern, and hence value, from the unstructured data. For example, you can use Twitter data to understand how customers behave in response to advertising. However, to extract the data points required, we must build intermediate data aggregate models to help us turn the unstructured data into something of value. Counting tweets on its own wouldn’t uncover this level of detail.”
4. Recruit, Retain and Grow your army of data scientists.
Technology is of course a powerful enabler; however technology is also a tool which must be utilised by experienced and knowledgeable individuals. Peter Laflin evaluates the importance of Data Science stating that “Data Science helps businesses answer the unknown unknowns about their business – some of these can be extremely valuable in creating new value propositions within a business. For example, we may learn something about our customers’ motivations for buying our product and their behaviours when interacting with that product which leads us to create a new product.”
However, great Data Scientists are composed of numerous attributes, combining maths and statistical academia, business knowledge and technological know-how, a blend of attributes which can be hard to come across. So, constantly searching and growing your talents is key.
Data Scientists are not miracle workers, and must be well equipped with specific queries in order to extract actionable insights.
5. Avoid Ambiguity. Embrace Clarity.
In order to draw out meaningful information from unstructured data, a deep understanding of the core business is needed as well as a clearly defined vision of what the business hopes to extract from the raw data. Data Scientists are not miracle workers, and must be well equipped with specific queries in order to extract actionable insights. Minimising the ambiguity of the queries can return more trustworthy and solid results.
6. Consolidate your data.
Although structured data may provide a measure of what is happening by extracting correlations between varying data sets, unstructured data can provide a means of understanding why. Integrating the two can well and truly provide an organisation with a better picture of what is going on. This can enable data evangelists to better discern, discover, communicate and predict future outcomes forming the basis of some of today’s most advanced predictive and prescriptive analytics.
7. Always, always, always protect your data.
Exploiting the opportunities within unstructured and dark data does not come without its risks and could be susceptible to a cyber-hack. Data that is considered redundant or of little to no use could contain sensitive and confidential information such as: financial information, patient records, trade secrets and much more. A collaborated effort is needed by the CDO and other stakeholders that may be involved in information security in order to classify, encrypt and protect this information from falling into the wrong hands. This could have dire consequences on compliance, brand reputation and consequently customer relationships.
Andrew Odong is the Content Director US/Europe for the CDO Forum. Andrew is currently producing the CDO Forum, Europe 2016, researching with the industry about the opportunities and key challenges for enterprise data leadership and innovating an interactive discussion-led platform to bring people together to address those issues – the CDO Forum has become a global series having been launched in five continents. For enquiries email: firstname.lastname@example.org