The Science of Data

Data Science is a field of study that uses Statistics and Calculus to make predictions in a scientific manner

“Data Science” is being bandied about everywhere. Some IT departments have got it working, some are working on it, some will start soon, and others want to do it. But if you ask a few practitioners or academicians to define Data Science, you will likely get multiple answers. Is it Artificial Intelligence (AI), or Machine Learning (ML), Deep Learning (DL), Neural Networks (NN), Big Data (BD), Reinforcement Learning (RL), etc.; a combination of these or some more things?

In fact, it is all of it. First, a simple comparison with the field of study of Physics. Physics has many sub-branches and different people specialize in one or more of these sub-branches. Similarly, Data Science has some branches and different folks specialize in one or more of these branches. It is, therefore, a field of study. It is the study of Data for its sub-branches, which is selected based on the business application.

Data Science is a field of study because it uses Statistics and Calculus to make predictions in a scientific manner. It looks at patterns and trends to project in future. It also likes at outliers and analyzes if these are actual emerging trends. Most of this work is done by identifying the most influential data points (for the question at hand) and focuses on them to make predictions. Another very interesting part of Data Science is its self-learning science; it can actually experiment and adjust until the accuracy gets to as high as 80% (in some cases more). It will also tell the expected accuracy percentage.

Another way to look at Data Science as a field of study is to look at its history. Data Science came into being decades ago, at universities and research labs. And it uses models of Statistics and Calculus which are even older. So, it is not an “evolving” science; what is evolving is newer applications that can be conceived and applied. It went mainstream so late because of insufficient computing power; in fact, there is now an overarching sub-division called Computational Data Science (that needs very high computing power). Something Cloud Service Providers have made easy to access.

We will end with some examples of recent applications:

  • HR: Predict which employee is like to quit the company; or is this particular employee likely to leave. A capstone project done by this columnist. 94% accuracy. Top 3 features for resignation: Number of hours worked in a month, Number of Projects, Tenure with the company.
  • BFSI: Credit application processing within seconds, without human intervention. Most of us have seen it. The few seconds are required only to pull credit reports, that the data science then selects the relevant data points in credit report and makes a decision.
  • Manufacturing: Predict which machine or assembly line will break down and therefore needs preventive maintenance.
  • Retail: Based on weather prediction of a cyclone, rejig the entire supply-chain for that area because the customers will need lots of specific items (e.g., drinking water, etc.), and almost none of some (e.g., cakes, etc.). Deployed at one of the largest retailers of the world, along with MIT. Also came in useful when COVID-19 lockdown started and ended.
  • Logistics: Store rearrangement so that adding/taking out parts is faster. Time saved is money earned.
  • Social: Multiple models predicting COVID-19 trajectory; many highly accurate.


The author managed large IT organizations for global players like MasterCard and Reliance, as well as lean IT organizations for startups, with experience in financial and retail technologies

Add new comment