Big Data In A Big Brave World
The ancient Mayans are often recognised for their skills in accurately analysing planets and stars in the night sky with the most primitive of tools. Ancient tablets provide evidence of their understanding of astronomy, as well as their ability to accurately predict lunar and solar eclipses many decades ahead.
Such examples of human endeavour are testament to our curious nature and inherent penchant for foresight. Interestingly, such practices bear parallels to what we now refer to as big data. The Mayans were effectively analysing data from the observable universe, which is comparable to an unstructured database with billions of years’ worth of data.
Similarly, big data — the practice of handling unconventionally large datasets, traditionally the preserve of the corporate world — has begun to enter our everyday lives. Organisations — both public and private — are collecting vast amounts of data about their customers, products and the macroeconomic environment, which can be analysed to identify trends, problems and opportunities. Companies can then introduce new products and services, ways of working or solutions that can solve real-world problems and enable radical change, beyond the corporate world.
In an enterprise environment, big data analytics is already being applied in areas such as risk analysis, cost cutting and product development. As an example, ZDLink is a service in Japan developed for Hitachi’s heavy construction business which enables the real-time monitoring of its vehicles — sophisticated sensors record and analyse the vast amounts of data produced by the machines on things such as the terrain, workload and stress, so that companies can service these complex machines with pinpoint accuracy.
These same principles, which are delivering cost savings and service in the corporate world, are also delivering remarkable life-saving benefits in everyday life. Traveling at over 300km/h, the bullet train in Japan remains one of the fastest, most reliable transportation systems ever built. Behind the scenes, its railways are intricately networked, using thousands of sensors to collect literally petabytes worth of data solely to update train times and communicate in real-time any updates which may affect the tightly run, by-the-second schedules, to provide customers with the service they expect. Whilst this system enables trains to run on time, it has also proven effective in saving hundreds of thousands of lives, as demonstrated in 2011 when the system was responsible for safely terminating all the trains during the Japan earthquake and Tsunami.
Bullet trains are also equipped with a Streaming Data Platform, a system originally developed for the Tokyo Stock Exchange. At over 150mph, the system analyses sensor data from the train wheels to predict with confidence when a wheel needs to be replaced or repaired. In addition, cameras attached to the train can identify sections of track to the nearest inch that need maintenance.
Meanwhile, in the healthcare world, big data analytics is being used to effectively create virtual humans, given birth by the vast data sets collected from patients and our understanding of the human body to speed-up and stress test the long-term effects of drugs. Usually those same drugs are developed and released from the lab with a very limited understanding of the long-term impact they can have on the human body. It can take many decades of drug trials before the right adjustments can be made, which is dramatically increasing the time people have to wait before cures are rolled-out generally.
We as humans are exponentially producing data and as we create more machines to generate it, the sheer amounts of information which we will have to store will only go in one direction. Our natural hunger for understanding coupled with the sheer amounts of data available to us has somewhat shifted our economic paradigm towards that of a “real-time economy.” Only two decades ago, the absence of email or a mobile phone would go unnoticed — today, its absence would engender a similar feeling to leaving the house without your key.
Yet one of the fundamental conundrums lies in predicting the future needs of big data, whilst meeting today’s storage capabilities and privacy regulations. Consider the following examples — in a decade’s time a laboratory investigating a cure for cancer may need data pertaining to patients today. Yet under data protection regulations, such data should have been deleted after five years. Or a geologist in 50 years’ time seeking to identify the causes of global warming, who finds that large sets of critical data had been deleted decades ago because they were not deemed as important at the time, nor was there enough data storage space to keep them all.
Finding the right balance in predicting what might be deemed important in the future and today’s regulatory, storage requirements is a hurdle that organisations are increasingly tackling. Understandably for those who are rightfully concerned about privacy, this certainly raises questions as to how long organisations should be entitled to hold personal data. One way that organisations are tackling this issue is by stripping identifiable personal information allowing them to keep the data on their systems indefinitely.
However, with the onset of technologies such as cloud computing, personal data is no longer being stored on just one server — it is being replicated, twice, perhaps three times in a myriad of formats including disc, tape and in virtualised storage. Thus, organisations without the right technologies to allow them to effectively exercise data protection rules such as the “right to be forgotten”, can sometimes find themselves struggling through the complex web of connections associated with just one file.
From a storage perspective, big data as the name would suggest, requires large data sets — yet with data being produced at astronomical rates, how can we contend with the storage requirements? In my opinion, technology development today is focused around producing and storing data. The rising prominence in technologies such as virtualisation has been specifically developed to help organisations become smarter and store more data using existing storage infrastructure.
Today, big data analysis is being used to create deeper understandings of the environments and scenarios around us. It is being used to make predictions and make real-time decisions, which can improve and even save our lives. With stars and planets as assets, the ancient Mayans used their billions of years’ worth of data to influence their culture, make decisions and gain perceived insight into the world around them. In a parallel, data has become our asset and is carving out new possibilities for our future.