Imagine a prediction on which a business decision is based going awfully wrong! How to have realistic expectation from data and data-based analytics
A couple of months back, we asked Indian CIOs on Twitter what is the area within tech that provides the maximum growth for next generation IT managers.
Analytics was the emphatic answer. Except for security, no other specialization came anywhere close.
And they are not exactly the exception. Organizations are increasingly talking of competing on data and building platforms using data. The number of organizations hiring chief data officers is increasingly on the rise.
Research after research predicts demand for data professionals. The latest, the annual salary guide by Robert Half Technology, says Big Data engineers are going to see the highest salary jump in 2017, followed only by data architects and data warehouse managers.
Data is big. Data is trending. Data means money.
Yet, after the U.S Presidential Elections, the question that people—and not just the skeptics—are asking is: is our faith on data and analytics a little too much?
One such strong reaction came from The New York Times which ran a report saying How Data Failed Us in Calling an Election? just a couple of days after the elections. New York Times’ Upshot is one of the top American sites in data journalism, as the journalism involving exploring and analyzing data is known. Analytics is core to data journalism.
Upshot itself predicted Hillary Clinton’s victory by analyzing data.
But the most watched statistically-modeled prediction based on data is that by FiveThirtyEight, the top data journalism site which mostly covers sports and elections. Its high profile founder editor- statistician, Nate Silver, has predicted many similar results in the past, including the election of president Obama. Silver’s site gave 71.4% chance to Clinton.
Honestly, for me too—as a practitioner, trainer and champion of data journalism—this piece is as much introspective as it is an editor’s mirror to its readers—the data practitioners among them, to be more precise.
In data we trust…
The obvious question for businesses to ask is: can we trust on data to take business decisions? If some predictive analytics go as wrong as it did in the US elections, what will happen to the big business decisions taken based on data analytics? Can’t they go terribly wrong too.
Those are valid questions. And can only be partially answered. Some of the obvious ones are as follows.
One, most of what Nate Silver and his team (and Upshot and all others) used to predict was poll data. And poll data, in business terms, would be not really factual data. So, that is one difference. But businesses use a lot of research data too, which is similar to poll data.
But the broader point is the problem is with data, not so much with the analytics. But that is not a solace. In businesses too, not everything is factual and a lot of assumptions are made, sometimes even assuming that they are based on factual data.
In fact, in a valiant defense after the criticism, Nate Silver wrote a long piece called How I Acted like a Pundit and Screwed up on Donald Trump, he actually admitted how they deviated from what is pure path of data, of course, after reminding how they have been successful in the past.
We could emphasize that track record; the methods of data journalism have been highly successful at forecasting elections. That includes quite a bit of success this year. The FiveThirtyEight “polls-only” model has correctly predicted the winner in 52 of 57 (91 percent) primaries and caucuses so far in 2016, and our related “polls-plus” model has gone 51-for-57 (89 percent). Furthermore, the forecasts have been well-calibrated, meaning that upsets have occurred about as often as they’re supposed to but not more often.
You can ignore the word journalism and it will still make sense for any kind of predictive data analytics.
But here was his biggest admission:
The big mistake is a curious one for a website that focuses on statistics. Unlike virtually every other forecast we publish at FiveThirtyEight — including the primary and caucus projections I just mentioned — our early estimates of Trump’s chances weren’t based on a statistical model. Instead, they were what we “subjective odds” — which is to say, educated guesses. In other words, we were basically acting like pundits, but attaching numbers to our estimates
In other words, he clarified that it was not data but the people behind data who were to blame for this mistake.
I think that is not a reason to feel relaxed. If it could happen with a veteran data guy like Silver, it could happen with any business manager. Data and analytics is science as soon as it gets into the machine. At the hands of human beings, it is prone to all those biases as human beings have. While it sounds so obvious and so elementary, My Dear Watson, err…data scientists, it remains a major challenge in the data journey of businesses.
And here, we are not talking of data science per se but its application in business.
In fact, I would strongly recommend reading the entire piece by Nate Silver here.