“Despite maturing data analytics within organisations data governance is still a major challenge and focus.
And often I hear about how data science is being done in isolation, as a skunkworks project, behind the scenes in organisations.
Is there a case for 'data science governance' to make sure disparate business units follow the same rules and policies? Or does this choke the inherent creativity in data science?”
Two weeks ago, after meeting with a data science executive from one of South Africa’s largest banks, I posted this question on LinkedIn. It came about because he talked about how he was running data science projects in his very specific part of the bank. But often they weren’t recognised or funded because they didn’t meet the group data science policies.
It’s generally accepted that data science projects are creative endeavours where the Data Scientist(s) applies his unique view of the problem to develop the solution. The project should not be boxed by a framework.
I posted my question on LinkedIn to get a sense of what the data analytics/data science market thinks. And the responses were very interesting.
Read them below for quick reference:
My feeling is that creativity around the project and governance are not mutually exclusive. You want the freedoms to explore the data, but you want highly governed data and models. I think the governance is important as projects reach maturity and move out of research phases and into production. But I see good governance as a building block to valuable analytics, not a hindrance. If done well your governance shouldn’t get in your way, it should save you from potential disaster. It will also be the thing that gives the industry longevity in an age of snake oil, I suspect poorly governed data and models will be big bubble poppers in the near future.
Data Analytics = Data Science. The former is somewhat deterministic, and therefore easier to manage expectations as it can fit into traditional business planning cycles. The latter is inherently unpredictable as it involves an unknown number of failed hypotheses before progress
is made. This can be frustrating for project managers, as it is harder to measure progress. Product owners get frustrated, and the short-term lack of results is deemed as a failure. As a result, data science teams are often shielded from product, and work on their own timeframes. This reinforces the silo structure that you so often see.
It is possible to have good data science governance, it becomes a question of timeframes. Organisations need to realise that Data Science is a long(er) term R&D investment, and treat it accordingly. It can still coexist with existing business practices such as Agile, but Deliverables need to be set in terms of number of iterations (in the shorter term), or on a 6+ month timescale.
An aim should be to get data to data scientists in a standardised, quality checked (if possible) format as quickly as possible, speeding up the Data Science lifecycle, enabling fast failures or successes.
Successes deployed should be governed, reasons for failures should be documented so that future cycles can be optimised.
All the points made here are good in my opinion, and the difference between analytics and data science is important. Traditional or advanced analytics can fit in with traditional governance structures, processes and policies, but experimental data science will prove more challenging.
The experimental and back testing phases of these projects require rapid access to data and the science element needs to have broad scope to experiment as often that process will find unexpected outcomes and need additional data quickly.
To support this you need to constrain the environment and not the data, have limited or controlled access and make it impossible to get data out without traditional governance kicking in. This supports the speed and flexibility but does not sacrifice the control, privacy, ethical or policy based risks.
Moving from science into production is the primary element of governance that needs to be determined and managed here.
Interesting thought Craig! I agree with Alex, and as most companies are only starting to explore the opportunities of true data science, I think it’s important not to confine thinking and processes with traditional governance structures. Embarking on a data science initiative is like going down a rabbit hole, you might end up proving tremendous value or essentially waste your time. I think failing fast and adapting quickly is key.
I prefer to call it a "data science framework" - if designed well it is possible to operate safely and gainfully between the paradigms of control and flexibility. And it is self-managing which makes it all https://www.linkedin.com/in/surengovender/the more palatable. I'm all for independence and creativity as long as we understand the importance of focus on thinking and outcomes.
I'd get your Data Analysts and Engineers getting the foundational building blocks right, and let the Data Scientists unleash their creative best (with a caveat that you give them a problem to solve, or a question you need answered, to guide them!)
The central theme that comes through from these comments/opinions is:
Govern certain areas of data analytics but give Data Scientists (good quality) data to be creative with to solve business problems...that often business didn’t know they had.
In most markets - mature and immature - the major challenge data science projects face is getting access to good quality data. There’s a general statistic that does the rounds that says that Data Scientists spend 80% of their time remediating and preparing data and only 20% actually deriving insights from it.
If that could be flipped then the productivity of the Data Scientists would increase dramatically.
So, the balance needs to be between governing the quality of data and the creative insights data science projects can find.
Craig Steward is the Managing Director: MEA for Corinium Global Intelligence - a company that specialises in data analytics conferences, networking events and information.
Craig has been working closely with the South African and Middle Eastern data analytics communities since November 2015 with a specific focus on the role of the Chief Data Officer. Recently he has developed a data analytics conference for the Egyptian market that will cover key strategic issues and topics facing data analytics professionals in the region.