Small Data: Back to Basics?
Over the last decade everyone and his friend have been taking about the phenomenon of “Big Data”. Big data refers to data or datasets that are too large or too complex to be dealt with by traditional data processing software. Statistically speaking, having huge amount of data leads to increased statistical power and thereby more accurate output but it also has some drawbacks. First and foremost, it is too complex for an individual to grasp and it requires specialist to be able to analyse and interpret the outcomes. Secondly the underlying behavioural aspects are lost, and greater emphasis is placed on the mathematical models which are used to understand the data. This can also lead to higher rates of false discoveries – discoveries which make no sense in the real world. Third, companies need to develop a higher reliance on information technology and are required to put in place systems and processes which can capture, store and make sense of the data. This drives up the data storage and acquisition cost and may skew the corporate budgets away from core competencies towards IT & analytics. Finally, capturing such huge amount of data requires collecting data without the consent of individuals leading to privacy issues.
In 2016, the bestselling author and management guru Martin Lindstrom (2016) published a book titled “Small Data: The tiny clues that uncover huge trends” which buckled the trend of big data. Although there is no formal definition of small data, but the consensus understanding is that “any data that is small enough for human comprehension can be termed as small data”. According to Lindstrom about 65% of the top 100 biggest innovations have been based on small data. In other words, big data focusses on finding the correlations but small data focusses on finding the causations. Small data focusses on understanding the data generation process thereby trying to develop an understanding of what is going on in the dataset.
To understand the importance of small data we need not look far. In 1979 engineers at Sony managed to combine a small cassette player with a small headphone and created the Walkman. Legend has it that Akio Morita – the Chairman of Sony – wanted this music while playing tennis. One day he expressed his desire in front of the engineers and the Walkman was born. For the next 2 decades it kept a vice-like grip on the portable cassette player market worldwide with its dominance never falling below 50%. But by the late 1990s the cassette technology was dead and was being replaced with MP3. In 1998, South Korea’s Saehan Information Systems created the first portable digital audio player, MPMan. It sold 50,000 players globally in its first year. By 2001, there were approximately 50 portable mp3 players available in the U.S. – and no firm had achieved the dominance that Walkman had enjoyed for the past 20 years. Apple launched the iPod in 2001 a full three year later than any of the competitor. According to people in the know, Steve Jobs delayed the launch because he wanted to figure out what the customer wants. Informal discussion with customers of MP3 players by Apple revealed the missing link – a place where customer could download music without the hassle of piracy. In 2003 Apple launched the iTunes store and the rest they say is history. If we introspect on the story, we will see that big data would not have told Apple the mantra for success – it was the patient listening to a few sets of customers which actually gave Apple the edge in the marketplace.
The importance of small data lies in the fact that it is small enough to be analysed without looking for grand hypothesis and large enough to follow the basic principles of statistics. Each and every data point can be covered quantitatively as well as qualitatively. Small data is nothing but back to basics – to the times of antiquity in fact where the philosopher monks and medicants had a popular saying – “Frustra fit per plura quod potest fieri per pauciora” meaning “it is futile to do with more things that which can be done with fewer”. Small data propounds the principle of parsimony or simplicity. In simple terms “do not overcomplicate stuff when simpler explanations exist.” Many may be under the impression that small data has evolved as a challenger to big data – but that is not the case. Small Data has actually emerged to provide more meaningful insights into artificial intelligence and internet of things. In an article published in The Forbes, the authors found that small data is important because it can trigger events based on what is happening now. Those events can be merged with behavioural or trending information derived from machine learning algorithms run against big data datasets (Kavis, 2015). Similar sentiments were echoed in another data-science magazine (Choi, 2019). According to Harvard Business Review, researchers conducted a 12 week-long experiment working closely with AI and human coders and found that AI can make smarter suggestions provided machine learning and human expertise are balanced and human input quality is improved (Wilson & Daugherty, 2020).
Over the past few years almost everyone is jumping into the bandwagon of analytics and trying to become a data analyst. No doubt data analytics pays well and provides a lot of opportunities but to stand out in the crowd the analyst need to understand that behavioural correlates and small data will play an increasing larger role in simplifying the complexities of big data
References:
- Choi, J. (2019). Don’t undervalue small data: Why small data is still valuable in this world of big data. Towards Data Sciences. https://towardsdatascience.com/dont-undervalue-small-data-5455abeb1fcb
- Kavis, M. (2015). Forget big data -- small data is driving the internet of things. Forbes. https://www.forbes.com/sites/mikekavis/2015/02/25/forget-big-data-small-data-is-driving-the-internet-of-things/?sh=25cf18d65d7e
- Lindstrom, M. (2016). Small data: The tiny clues that uncover huge trends: St. Martin's Press.
- Wilson, H. J., & Daugherty, P. R. (2020). Small data can play a big role in ai. Harvard Business Review. https://hbr.org/2020/02/small-data-can-play-a-big-role-in-ai
- Written by Rohit Vishal Kumar