The assumption of many is that ‘Big Data’ is a fairly new concept. The connotations that surround the word ‘data’ often lead many of us, including myself, to assume that this a modern phenomenon. Growing up in a digital era, I believed that such information and data could not possibly exist without the software or technology to store and harness it. Since working in a Big Data environment, it became clear that these thoughts I once had were far from correct.
When researching the history surrounding Big Data, it was obvious that what is considered important in its history is fairly subjective and differs from source to source. One man however who is constantly referred to is Erik Larson. He is seen by many as the ‘pioneer’ of the term Big Data. The context in which Erik Larson used this term was not in the same context we would use it today – it referred to size only. In an article for Harper’s Magazine, he wrote “The keepers of big data say they are doing it for the consumer’s benefit. But data have a way of being used for purposes other originally intended”. This is an interesting concept and one that continued and continues to gain credit. So what factors do we use now to categorise big data? The three Vs (or four Vs depending on who you ask) are often the deciding criteria for what we consider Big Data; Velocity, Variety and Volume. The fourth, Veracity seems more of a modern term in this categorisation.
One of the greatest changes in the history of Big Data is the differences within storage and accessibility of it. The problem with digital evolution is that much ‘Big Data’ become lost or can no longer be analysed. For example, many years ago (and when I say many, I mean approximately 2.5 million years), it was not uncommon for tribe people to record their data by using notches on bone. Such methods of recording data simply cannot survive the test of time.
The rise of the Internet of Things allowed for data and ultimately ‘Big Data’ to expand significantly. It could be argued that although big data existed, it has only been during this expansion that the true sense of it has been able to flourish. This range of ‘things’ that we’re now able to record data on a daily basis meant there was a rapid increase in the amount of data available and being collected.
What data is considered worthy and how has this changed over time? One of the greatest changes is how data is no longer used on an individual basis but more importantly, to improve and help a collective such as a community or society. These large datasets made it easier for analysts to cleanse and manipulate the data for their purpose. Ultimately, the largest change has been from using data to be proactive rather than simply reactive to an event or change. For example, John Gaunt experiment in statistical data analysis was a move towards using data to prevent problems and in this case, outbreaks of the bubonic plague. In my own opinion, one of the greatest and most important changes within Big Data is the way it is now being used to empower, enhance and cause change rather than simply just being recorded.
Blog post by Holly Shea of the Business and Local Government Data Research Centre. Holly writes with a unique perspective on big data after joining the BLGDRC as an Intern in 2016. Please if you have any questions about the contents of this post.
Published 09 January 2017