Big Data and Data Mining

The 21st century is the century of data. Now more than ever, we have million and millions of terabytes of data at our disposal.

Word Cloud "Big Data"

Previously, we had to store our data on paper or some other physical medium. Now we have all our data in a digital medium. This provides us a huge advantage when it comes to analyzing and discovering trends in data.

At the heart of analyzing these large amounts of data are data algorithms. These algorithms take in large amounts of data and discover patterns or trends. They are also used to make models and predicts events such as elections and sports games. A very interesting case of data analytics is the prediction of the US elections using statistics and data algorithms.


Nate Silver, the statistician who predicted the US elections in 2012.

The actual process is rather complicated (obviously) !! But basically it involves modelling an unobserved variable , the intended voting behavior in each state to the observed variable which is the election result.

First we model how people would vote for Obama on 1st January 2011. We associate the variables such as wealth and race.  The model on 1st January 2011 is not an accurate model as people’s opinion will change over time. We apply a function that takes into account how people’s opinion can change. We can call this the clock function.

At the end , we observe the results of our calculation and remember , THE MATH NEVER LIES.

For those of you who remain skeptics, here is solid proof that the math indeed works. 😉

A7Ery-hCYAAykAw (1)

This same concept can be extended to sports and even to Wall Street to model the economy, make financial models and make a killing. This is what hedge funds like Two Sigma and Renaissance do.


