Introducing Big Data

The world is getting smarter. From healthcare to houses to businesses, everything is getting smarter, thanks to Big Data. Before we talk about it, let us first see a few examples that make use of Big Data. That way, it would be easier to grasp the concepts later on.  

  • Cricket is very popular in India and other countries. It is now possible to embed sensor boards comprising micro-controllers, wireless radio, inertial sensors and a chip that captures all the motion data. Not only balls, but even the bat handles are embedded with sensor technology that captures each impact of the ball. Coaches and players can analyze this data to improve their performance
  • In tennis, a sensor system can record a player’s performance providing real time statistics and comprehensive match analysis
  • There are devices today that track your steps, speed, heart rate, weight, the calories you are burning and more. You can even view your progress over time and manually log workouts. These devices are small enough to be worn on your hand
  • In homes, there are smart thermostats that monitor the home and heat / cool appropriate areas of the home. For example, if you are in the living room watching TV, the air conditioner in the bedroom is automatically reduced to a minimum
  • Companies are introducing smart TVs that use face recognition to make sure that your children so not watch anything that is unsuitable for their age
  • Cars of today guide you on the road, tell you how much fuel you have left, and assist you in parking the vehicle
  • In astronomy, instruments such as the Large Hadron Collider generate a huge amount of data – about 1 TB per second – that needed to proceeded to enable scientific discoveries. Thanks to Big Data, this is now easily possible 

At the heart of all these innovations and analyses that are aimed in making our life more comfortable, reduce pollution by lowering energy consumption and gain knowledge is Big Data. The basic idea behind the phrase ‘Big Data’ is that any activity we do leaves a digital trace (data). This vast data, which is typically unstructured can be captured, filtered and analyzed. The Foundation for Scientific and Industrial Research from Norway has reported that over 90% of the world’s data that has been recorded has been generated only in the past 3 – 4 years. It is only now that we have begun to get meaningful information from this data. There is little doubt that Big Data is set to change our lives forever. It is already changing the way we live, exercise, play, eat, run businesses and even run cities. Unlike traditional data warehouses that rely on highly structured data, Big Data gainfully utilizes data in any form, irrespective of whether it is structured and stored in relational databases, semi-structured and emerging from sensors, machines and applications; or is unstructured. The last category is especially important. In a broad sense, unstructured data is data which cannot be stored easily or indexed in traditional formats or databases. A few examples of unstructured data include e-mail conversations, social media posts, photos, voice recordings, etc. Quantitatively, Big Data is about 1000 times more than the traditional data  Qualitatively, the forms and functions of Big Data are about 10 times diverse than traditional data. Truth be told, this data was always available. It is only modern technological advances like Artificial Intelligence (AI), cloud computing, Internet of Things (IoT), etc. that have made it possible to harness this data and put it to good use. While Big Data has the potential to enable new insights that can change the way we live, new algorithms, method, infrastructures and platforms are required that would make sense of all this huge data and provide meaningful insights. This is the work of IoT development platforms like PTC ThingWorx.  

We are only scratching the surface as far as the use of Big Data is concerned. To realize its full potential, researchers and practitioners need to address several challenges and develop suitable conceptual and technological solutions to tackle them. These include life-cycle management of data, large scale data storage, flexible processing infrastructure, data modeling, scalable machine learning and data analysis algorithms, techniques for sampling and making trade-off between data processing time and accuracy and dealing with privacy and ethical issues involved in data sensing, storage, processing and actions.  

The V’s of Big Data  

Big Data Volume  

The quantity of data generating in the world almost doubles every one to one and a half years. While traditional data is measured in Gigabytes and Terabytes, Big Data is measured in Petabytes and Exabytes. The volume of this data is really mind boggling – try to write the zeros in an Exabyte! It takes special analytics and computer power to find something specific in it. Cloud computing and artificial intelligence has made it possible to process this data without investing into super computers. And of course, efficient software like ThingWorx, that provides a platform to crunch the data.  

Big Data Velocity 

If traditional data is like a lake, Big Data is a rapidly flowing river. It is thanks only to high internet speeds and IoT that it is possible to process such huge data in relatively less time.  

Big Data Variety  

While traditional data is formatted, Big Data can take many forms. It can be text, it can be pictures, it can be audio, or it can be video. In most cases, it is combination of one or more such media. And while we are discussing variety, let us mention that the data can come from diverse sources like devices, mobiles, web logs, and desktops.  

Big Data Veracity  

Almost anything systems, objects or processed o generate data; not all of it is meaningful. One of the big challenges of Big Data analysis is to filter out the noise from the sound.  

Big Data Science  

Big Data is part of Data Science, a discipline that merges concepts from computer science like algorithms, programming, machine learning and data mining with statistics and optimization coupled with domain knowledge. The domain knowledge can include business logic, applications and visualization. Big Data extracts insights from data and transforms it into actions that have an impact in the particular domain of application.  

Although part of the Big Data revolution is enabled by new algorithms and methods to handle large amounts of heterogeneous data in movement and at rest, all of this would be of no value if computing platforms and infrastructures did not evolve to better support Big Data. New platforms providing different abstractions for programmers arose that enable problems to be represented in different ways.  

To Summarize:  

  • Data has always existed, however it is only in the last few years that there were means of capturing it easily
  • Big Data essentially makes sense of captured but mostly cluttered, unstructured data to gain meaningful insights into processes, applications or systems
  • Big Data analysis has been made possible only because of the advent of suitable platforms like ThingWorx