In this article TLF Research lead insight analyst Stephen Hampshire identifies the five main things managers need to know when understanding data.
At TLF Research we know that good decisions require good information, and most businesses are awash with measurement and data.
What we see quite often is that managers don’t feel confident interpreting the data they’re faced with. Even worse, myths and misconceptions often lead people to discount data that could be valuable if used in the right way, or place too much faith in figures that don’t mean what they think.
In this article I’m going to run over 5 tips for managers who want to understand data, especially survey data, properly.
1. Survey Data Are Estimates
The root of a lot of problems comes from people not understanding that all survey data, and lots of other data as well, are estimates. You never know precisely what the NPS for all customers is, because you didn’t talk to all customers. What you have is an estimate based on a sample from the population.
This contrasts with data which it is possible to measure directly (e.g. how many visitors did you have to your website, how many complaints were logged, etc.). That’s not to say that the estimates are unreliable. The clever bit of statistics is that you know how good your estimate is, which is something we’ll come back to later.
So when you look at a dashboard, it may mix together these two types of measure. For example: you know for sure that 2,345 complaints were logged last month, and you’re confident that your NPS was in a range from 55 to 59 (but you can’t be sure that your NPS is exactly 57). You always need to be clear which type of metric you’re dealing with — direct measurement or statistical estimate.
2. Bias Matters More Than Noise
Many people understand that survey data is not exact, or perhaps just feel it intuitively because scores fluctuate from month to month even when they know no underlying changes are taking place. That noise is the reason that it’s important to see your scores as an estimate with a margin of error, rather than an exact figure. Once you understand them in that way, the noise no longer matters — what you’re looking for is a signal strong enough to cut through the noise.
Far more important is the impact of bias. Unlike noise, which adds random error to your measurement, bias creates a systematic error. Important sources of bias to think about are non-response bias (who chooses to take part in your survey), leading questions, and biased rating scales. There is no way to correct for bias with your analysis, so it’s really important to make sure that you know how the survey was done, what the response rate was, and how the questions were asked.
3. “What’s The Sample Size?” Is The Wrong Question
“What sample size do I need?” is an obvious question, but unfortunately it doesn’t have an obvious answer. It matters because your sample size controls how wide the margins of error are around every number you report. Larger samples have smaller margins, but you need to quadruple your sample size to halve your margin of error.
There’s no single rule of thumb for what makes a reliable sample (once you get past a minimum of 30 for standard statistical tests to work). The real question is — how small a difference do you want to be able to find? If you’re hunting for very subtle effects, then you need a big sample size; if you’re looking for big effects, then a relatively small sample size will do.
A point that most people seem to miss: if your data shows a statistically significant effect, then by definition the sample size was big enough (as long as it’s an effect you were looking for to start with, not one you found by searching through the data).
4. “Significant” Is Not The Same As Important
Sometimes it probably feels like you can’t win. I’ve told you that you can’t look at survey data as if it’s an exact number, you need to use statistical tests or margins of error (which are two ways of asking the same question: is this difference real?). Now I’m going to tell you that even that isn’t always good enough.
Too many people use statistical testing as the basis for “fishing expeditions” in their data — they test everything and report the things that look significant. This is problematic because every single test has a 5% chance of being wrong, so as you do more and more tests the chances that one of them is wrong mounts up very quickly. The answer is to save your tests for analytical questions you’re really interested in, based on changes you’ve made to the customer experience or external changes you think may have had an effect.
When you do find a statistically significant result, make sure you’re clear on what it means. It doesn’t mean that it’s an important or large effect, it just means that it’s almost certainly something that is really there in the population. You need to apply business knowledge to interpret whether it matters or not.
5. Correlation Is Not Causation
It’s good that people want to tell causal stories with their data and that’s essential if you’re going to do anything with it. If you find a difference between two groups of customers, you need to know what caused it. If your score is trending down, you need to know why.
But the tools we have available to us (driver analysis, statistical testing, decision trees, etc.) are not able to prove causation. What they show is that there is likely to be a link between the things we’re looking at (let’s say it’s customer satisfaction and retention), and also how strong that link would be if it was causal.
That’s a good start, but you need to be very sure of your thinking before you make serious investment decisions, and that’s going to take more work. Depending on the link you’ve identified, that might mean an experiment, or it might mean causal modelling using something like an instrumental variables approach.
This has been a very quick introduction to 5 fundamental concepts in understanding statistics properly. They apply to any survey data, and quite a lot of other data as well. Many of these are places where people go wrong in their use of machine learning with “big data”, for instance.
Crucially though, you don’t need a degree in statistics to understand these things. We can all get better at using and thinking with the data we’re presented with.