“Data is King.” “We need data to back up this claim/strategy/hypothesis.” All this continues to be said despite sufficient instances warning us of relying too much on data. This is becoming more critical given the current emphasis being placed on Big Data, Analytics, Artificial Intelligence (AI) and Machine Learning (ML). The implications for ML are especially frightening because one wonders what the machines will learn.
We will start by quoting the British economist, Ronald Coase, who said in 1981, “If you torture the data long enough, it will confess.” And therein lies the danger. In fact the danger is even bigger than that: Are the right questions being asked during this ‘torture’. A few decades back, a global pizza chain conducted a feasibility study to enter India. The research firm came back and said that only 10-15% of residents of Mumbai can afford the pizza at their price point. So the idea was abandoned. A few years later, someone else came along the same report and he was shocked at the conclusion. He went on an internal campaign stating that 10% of Mumbai’s population was up to 10 times the size of many mid-sized towns in the West. So they opened the first store in Mumbai. Within 6 months, it became the largest selling outlet for this chain in the world, by number of units sold.
The second danger stems from bias in how/ why/ what of the definition of data. A few years ago, Chicago Police launched a program with much fanfare where they had a heat map of 400 individuals who were most likely to break the law. It was based on predictive analysis of crime data. And it was to help them pre-empt criminal activity. And then a local major newspaper ran a story that identified at least 1 person who had no criminal record at all. Further analysis revealed that the there was a serious racial bias in the analysis. Needless to say, the program was abandoned. As a matter of fact, such bias is the biggest danger of data.
Data is also agnostic to fraud, impersonation, errors and/or omissions. The usual argument is that larger the data set, lesser is the impact of these. This is a fallacy. A common trick on the largest e-commerce sites is for merchants to buy their own products, write glowing reviews and then quietly book them as returns. This order placement and returns can be done by bots, creating a whole ranking misconception. Essentially, bots have been beating most data collection systems on a regular basis. A similar example is the well-known strategy by most large search engines to rearrange the search results by assigning ‘weightage’.
The take away is that Data can be King only if it is clean, examined without preconceived notions, and seen with a clear purpose. This is true even if the purpose of data analysis is discovery (finding truths that were not imagined earlier). And it is the worst when used as a detail in surgical strikes like micro-segmentation. All of these requirements are hard to put together, leading to misinformed opinions and misplaced claim/strategy/hypothesis.
Data does have a lot of information, but finding it is not merely a technical challenge. Data/Statistical analysis is not easy. It is a challenge summed up in the famous quote by Prof. Aaron Levenstein: “Statistics are like bikinis. What they reveal is suggestive, but what they conceal is vital.”Addressing these challenges is how the king will get his clothes. Until then, Big Data, Analytics, AI and ML will produce suspect results.
The author managed large IT organizations for global players like MasterCard and Reliance, as well as lean IT organizations for startups, with experience in financial and retail technologies