The world is getting smarter. From healthcare to houses to businesses, everything is getting smarter, thanks to Big Data. Before we talk about it, let us first see a few examples that make use of Big Data. That way, it would be easier to grasp the concepts later on.
At the heart of all these innovations and analyses that are aimed in making our life more comfortable, reduce pollution by lowering energy consumption and gain knowledge is Big Data. The basic idea behind the phrase ‘Big Data’ is that any activity we do leaves a digital trace (data). This vast data, which is typically unstructured can be captured, filtered and analyzed. The Foundation for Scientific and Industrial Research from Norway has reported that over 90% of the world’s data that has been recorded has been generated only in the past 3 – 4 years. It is only now that we have begun to get meaningful information from this data. There is little doubt that Big Data is set to change our lives forever. It is already changing the way we live, exercise, play, eat, run businesses and even run cities. Unlike traditional data warehouses that rely on highly structured data, Big Data gainfully utilizes data in any form, irrespective of whether it is structured and stored in relational databases, semi-structured and emerging from sensors, machines and applications; or is unstructured. The last category is especially important. In a broad sense, unstructured data is data which cannot be stored easily or indexed in traditional formats or databases. A few examples of unstructured data include e-mail conversations, social media posts, photos, voice recordings, etc. Quantitatively, Big Data is about 1000 times more than the traditional data Qualitatively, the forms and functions of Big Data are about 10 times diverse than traditional data. Truth be told, this data was always available. It is only modern technological advances like Artificial Intelligence (AI), cloud computing, Internet of Things (IoT), etc. that have made it possible to harness this data and put it to good use. While Big Data has the potential to enable new insights that can change the way we live, new algorithms, method, infrastructures and platforms are required that would make sense of all this huge data and provide meaningful insights. This is the work of IoT development platforms like PTC ThingWorx.
We are only scratching the surface as far as the use of Big Data is concerned. To realize its full potential, researchers and practitioners need to address several challenges and develop suitable conceptual and technological solutions to tackle them. These include life-cycle management of data, large scale data storage, flexible processing infrastructure, data modeling, scalable machine learning and data analysis algorithms, techniques for sampling and making trade-off between data processing time and accuracy and dealing with privacy and ethical issues involved in data sensing, storage, processing and actions.
The V’s of Big Data
Big Data Volume
The quantity of data generating in the world almost doubles every one to one and a half years. While traditional data is measured in Gigabytes and Terabytes, Big Data is measured in Petabytes and Exabytes. The volume of this data is really mind boggling – try to write the zeros in an Exabyte! It takes special analytics and computer power to find something specific in it. Cloud computing and artificial intelligence has made it possible to process this data without investing into super computers. And of course, efficient software like ThingWorx, that provides a platform to crunch the data.
Big Data Velocity
If traditional data is like a lake, Big Data is a rapidly flowing river. It is thanks only to high internet speeds and IoT that it is possible to process such huge data in relatively less time.
Big Data Variety
While traditional data is formatted, Big Data can take many forms. It can be text, it can be pictures, it can be audio, or it can be video. In most cases, it is combination of one or more such media. And while we are discussing variety, let us mention that the data can come from diverse sources like devices, mobiles, web logs, and desktops.
Big Data Veracity
Almost anything systems, objects or processed o generate data; not all of it is meaningful. One of the big challenges of Big Data analysis is to filter out the noise from the sound.
Big Data Science
Big Data is part of Data Science, a discipline that merges concepts from computer science like algorithms, programming, machine learning and data mining with statistics and optimization coupled with domain knowledge. The domain knowledge can include business logic, applications and visualization. Big Data extracts insights from data and transforms it into actions that have an impact in the particular domain of application.
Although part of the Big Data revolution is enabled by new algorithms and methods to handle large amounts of heterogeneous data in movement and at rest, all of this would be of no value if computing platforms and infrastructures did not evolve to better support Big Data. New platforms providing different abstractions for programmers arose that enable problems to be represented in different ways.
To Summarize: