Share |

Big Data Talk 007: The Data Scientist

When it comes to performing big data analytics, people are quick to point to the data scientist as the integral piece of the puzzle. However, it is a role which many struggle to fill. Dominic Pollard asks why this is and what can be done to bridge the skills gap.

When people discuss the main obstacles on the road to a successful big data project, the most oft cited problem is people. Indeed, when we surveyed over 250 people for the 1st Big Data Insight Group Industry Trends Report at the start of the year and asked them what they believed was the biggest barrier stopping them taking advantage of big data, the most popular answer was ‘a lack of relevant skills’ which was selected by 39 per cent of respondents.


At the heart of the issue is the fabled ‘data scientist’. The way they are talked about in big data circles, you would fast assume that they are little more than the figment of someone’s imagination.


In an interview with Big Data Insight Group a few months ago Jesper Sparre Andersen, data innovator and manager of visualisation specialists Bloom Studios, explained what makes data scientists so hard to find. He said: “What really makes the best data scientist is so subtle it’s hard to describe, but I think it’s in being able to create a narrative from the data. It is being able to see the data and understand what the significance is within it and conveying that to someone who needs to know it but may not know anything about the technology behind it.”


This may sound straightforward but having the knowledge, skills and experience to execute such a task is far from simple. “What makes good data scientists so rare is that it is such a highly cross disciplinary role; you need to be great in several fields,” Andersen explained. “You need to understand statistics and be able to implement and then re-implement models based on them because once you get into the big data range things start to break down. The original algorithms are no longer valid. You need to be at the upper end of computer engineering as well.”


Moreover, these ‘hard skills’ in statistics and software must be aligned with a knowledge of the business world so you can determine what is and is not of value to your organisation.


As becomes clear from Andersen’s comments, there is such a range of skills required to perform the role of a data scientist that you would think that they were created in Bill Gates’ basement in a scene akin to Mary Shelley’s Frankenstein. You need to have a solid grounding of complex computer science, business acumen, knowledge of data mining, algorithms, statistics, mathematics, data visualisation and underpinning all of these you need an in-depth appreciation of the organisation and its objectives to ensure that you are delivering relevant insights to the right people.


Deloitte has predicted that the difficulty in finding data scientists could trigger a talent shortage, with up to 190,000 skilled professionals needed to cope with demand in the US alone over the next five years. You may ask, therefore: ‘if this is a position which is now in such high demand with the rise to prominence of big data, why have we reached a shortage in talent to fill the role?’


Well, David Chan, the director at the Centre for Information Leadership, City University London, explains that “the key issue about big data and the likely shortage of data scientists is due to the lack of focus in the last two decades in the teaching of mathematics and scientific subjects”.


“In the last 20 years, it was not cool to be scientists,” he adds. “Luckily, this is being turned around with popularises of science like Prof Brian Cox, Professor Marcus Sautoy, and Professor Jim Al-Khalili et al becoming more prominent. There is an uptake in STEM [science, technology, engineering and mathematics] subjects in secondary schools. This will help to address the needs.”


In essence, it would appear that the talent shortage stems from an inadequate education system around computer sciences. The fundamental need to teach future 'techies' and entrepreneurs the technological knowhow has been ignored by what has already been a lambasted, outdated ICT curriculum. Michael Gove’s plans to reform said curriculum may well be too little too late as the demand is there now, desires to change secondary school practices in the near future won’t be able to fill the talent void. Furthermore, popularising scientists on TV will not reverse a generation of students without the necessary components to fulfil the needs of organisation for individuals who can extract the maximum value from their data.


Big data seemed to arrive, for many organisations at least, from relative obscurity; one minute people hadn’t heard of the term and the next it was the latest ‘must do’ trend in the business world. The result has been a delay in academia trying to catch up. There are very few courses currently available although both Andersen and Whiterhorn say that more are on the horizon and emerging all the time. Moreover, Chan’s Centre for Information Leadership at City University London offers a course for aspiring CIOs – the Masters for Information Leadership – which teaches the interdisciplinary skills needed to be able to extract value from data.


The emphasis here must be on interdisciplinary skills. As already made clear, that is what sets the data scientist apart from other job roles and makes them so hard to come by. The education system, from secondary school ICT courses to university and postgraduate degrees, must focus on creating people who possess the technical knowledge, with the ability to apply the science behind data analysis to the world around – in essence, turning numbers and data sets into information relevant and useful for their organisation and its objectives.


However, for the foreseeable future at least, it is going to be extremely difficult to find data scientists; they are the proverbial needle in the haystack. This has meant that it may prove more productive to try and assemble a crack squad which possesses all these skills, rather than findings all these things in just one person. As Mark Whitehorn, professor and chair of analytics at the University of Dundee’s School of Computing, says: “Such is the range of skills and experience – data, statistics, data mining and algorithm design – required to execute big data analytics, it makes sense to think in terms of assembling a team rather than seeking all of these skills in one individual.”


Ultimately, although the problem around the necessary skills and personnel to execute big data is certainly real, it should not be seen as entirely bad news. After all, experts often stress that people are the key to being successful when it comes to big data. The storage is cheap, open source tools and technologies exist to help you perform the analysis and visualisation of data, while cloud helps enable the whole process. It is having the right people who know how to use the technology and, more importantly, are asking the right questions for the organisation.


The fact that organisations are preoccupied with the lack of skills and personnel for big data is an indication that their priorities are in order and that they are thinking about the task at hand from a business first perspective, not just throwing data at a wall and seeing what sticks.