Predicting the future can be a very tricky business, especially when it comes to emerging technology. Predicting what will happen within the Big Data landscape therefore poses a question which is as compelling as it impossible to answer.
What is possible to say is that the existing technology stack centred on Hadoop will definitely continue to grow and mature. In addition, the emergence of artificial intelligence (AI) and machine learning will become more prevalent. So what else? To be honest if I add visualisation to the stack then I don’t think much will change in the technology landscape.
This does not mean that it is all doom and gloom; there will be change and big change in the next couple of years. It will all be centred on Big Data platforms such as Hadoop and Google’s cloud offering, and how this will become more ubiquitous within business. The technology is there, stable and is flexible, powerful and robust. The problem is that the technology has moved faster than most corporate IT teams can handle. Currently there is a massive drive to up-skill in the new technology, not just from a technological point of view but also from a management perspective. With the growing understanding of not only how this technology works, but what it can do, and how it can be employed, the landscape will evolve to fill in the gaps that are currently there.
If you then add the emergence of machine learning and AI into the mix that change will be even greater. Again in the area of machine learning and AI there is a massive effort in catch up as analysts and computer scientists scramble learn the new skills. The process of democratising this skill set has only just begun and will end with the productisation of common machine learning tasks.
Next comes visualisations which are not really a new skill; reports have contained graphs and charts for long enough. Visualisation provides a great overview of a large data set. Within the Big Data arena data can and often does come in streams, and therefore the visualisation has to be smart enough to handle it. This has led to the development of dashboards which can display a range of visualisations dynamically changing.
The future of Big Data is secure with the technology maturing becoming more democratised and more prevalent. Machine learning tools will also become more prevalent and the use of data lakes for storing large quantities of data will also be on the rise. Cloud based solutions will ensure that SMEs are not left behind in the drive to put data at the heart of the business.
What is Hadoop?
Apache Hadoop is a framework that allows for the distributed processing of large datasets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. The framework is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures. At the centre of Hadoop is the file system HDFS which provides the scalable, fault-tolerant, cost-efficient storage for big data. On top of that sits the tools such as YARN for the scheduling of jobs and MapReduce which can be used to structure large unstructured datasets.
Blog post by Richard Skeggs (Senior Data Development Manager for the Business and Local Government Data Research Centre), please if you have any questions about the contents of this post.
Published 10 November 2017