Anyone who has lived through the COVID-19 pandemic won’t be surprised at the results of new research from UNSW Business School – that people jump to conclusions when they read about studies with relatively small sample sizes. This doesn’t just extend to the general public either. The research (nearly 4000 participants) was found to apply to a wide variety of participants, including tertiary-level statistics students and senior business leaders.
These findings from by University of New South Wales Business School’s Dr Siran Zhan, Senior Lecturer in the School of Management and Governance, show just how easily people jump to conclusions when reading about studies, making it critical that journalists – and the general public – communicate and digest this information with a critical eye.
In the study, Relative Insensitivity to Sample Sizes in Judgments of Frequency Distributions, Dr Zhan and her co-author, Dr Krishna Savani, Professor of Mgt at the Department of Mgt and Marketing at The Hong Kong Polytechnic University, show people ignore sample sizes in their judgments to be unduly confident in conclusions from studies with 3 participants.
“What surprised us was that when we examined samples of university-level statistics students and seasoned senior executives who are supposedly trained in their education or professional work to make judgments and decisions according to sound statistical principles, they ignored the sample size just as much as the public,” Dr Siran Zhan also said.
“It is especially appalling to think many important businesses and public policy decisions might have been made based on unreliable results from small samples. The research shows that people might not have the correct intuition as to what counts as evidence, making it difficult to correctly use statistics and research evidence to guide their inferences & decisions. The good news? The researchers also tested a way to prevent the spread of misinformation,” she added.
What is a sample size, and why is it important?
Early in the COVID-19 pandemic, pharmaceutical and biotechnology company Moderna (MRNA) reported that its experimental vaccine was successful in eight volunteers. While only a small group of healthy volunteers were tested, journalists were quick to report the news, which was so well received that it drove up Moderna’s share price by 20%.
Just hours after announcing the trial’s success, Moderna sold 17.6 million shares to the public, raising US$1.3 billion. While Moderna, and several of its top executives, profited off the back of the boom, some critics say it overstated the significance of the vaccine trial and manipulated the mkt. Examples like these demonstrate that most people don’t overthink the significance of a study’s size when making assumptions from articles they read.
“People’s general tendency to be unduly confident in conclusions from tiny samples is incommensurate with statistical principles & can lead to poor judgment & decisions,” she said.
So, in six experiments involving a total sample of 3914 respondents, she tests whether people pay attention to variations in the sample sizes, which vary by one or two orders of magnitude. The findings reveal people pay minimal attention to variations in the sample size by a factor of 50, 100, and 400 when making judgments and decisions based on a single sample.
“Even with a sample size of three, participants’ mean confidence level was 6.6 out of 10, indicating that people have pretty high confidence in data from incredibly small samples, consistent with prior research. As researchers, we realise that the same finding is much more believable from a sample of 3000 than from a sample of 30. However, shockingly, the general population does not appear to share this intuition,” explains Dr Zhan.
What is an appropriate sample size?
With the increasing spread of online disinformation and misinformation, making judgements about what we’re presented with in the media is becoming increasingly important.
“With the proliferation of statistics in the news media and in organisations that call for evidence-based decision-making, the current findings indicate that people might not have the correct intuition as to what counts as evidence, making it difficult for them to correctly use statistics and research evidence to guide their inferences and decisions,” explains Dr Zhan.
But is there such a thing as the right sample size? Bigger is generally better, statistically. “The mean result from any sample is pulled or biased by outliers. But when your sample size increases, your sample gets closer to the population, meaning fewer estimation errors. When the sample size is small (e.g., 30), any outlier has a much stronger effect on the mean, making your mean less reliable than when the sample size is large (e.g., 3000),” explains Dr Zhan.
“The only issue is the cost of time and money to collect data from a very big sample. Put another way, when you estimate an effect from a sample (e.g., 500 clients), you are always trying to generalise your result to a population (e.g., your 13,974 existing clients), which in reality, is too large for you to thoroughly study. Therefore, a trade-off must be made based on sound statistically ground so that we work with a statistically reliable yet feasible sample size.”
How can you design a misinformation proof study?
Judgements & biases regarding research design & methodology don’t just affect what we read in the media; these biases affect most aspect of our lives, from public policies to workplaces.
“Organisations evaluate employee performance based on a limited time window or a small number of projects (e.g., monthly sales record or past three projects). In these cases, entrepreneurs and managers need to understand that their findings, however substantive, may not be reliable if they were drawn from small samples,” explains Dr Zhan.
Therefore, Dr Zhan’s research holds important implications for media, journalists, policymakers, & businesses who often use results from samples to make critical decisions.
To improve decision quality, all statistics must be accompanied by statistical inferences and ‘layperson interpretations’ of the statistical inferences. “We recommend more statistical advice (i.e., a layperson interpretation of the strength of evidence statistics) to be provided to aid their interpretation of findings from samples &, ultimately, decision-making,” she says.
What does this look like in practice? “For example, the Environmental Working Group provides a searchable online database with information on skincare product safety (example here) on two primary scores: The strength of an effect (i.e., the hazard score) and the strength of evidence (i.e., data availability). “The data availability information is equivalent to the strength of evidence information that we are advocating here,” explains Dr Zhan.
Whose burden is it to educate the consumers?
Consumers do not always read research articles, so research generally reaches consumers through product information, news, and books. “Therefore, we recommend that the strength of evidence statistics be presented alongside data availability information,” explains Dr Zhan.
Consumers should be educated to question any claims unless there is strong evidence (i.e., a large amount of research involving large sample sizes). But educating consumers is difficult; more importantly, we think the burden must be placed on businesses, journalists, & the media.