Data Distillation

December 7, 2016


What does premium vodka and ResearchBods data have in common?

Allow Isaac, our in-house Data Scientist, to explain…

Like most people, I end the workday with a bottle of triple-distilled vodka. When you drink this much, you start to find that the cheaper varieties just don’t sit right with you after a while, and you permanently switch to the premium brands. The reason for this difference in flavour and strength is usually plastered all over the ad-campaigns and the bottles – they have been filtered or distilled further than other varieties, to produce a purer spirit. And if you are in doubt about this, try pouring some of that £10-a-bottle vodka through a Britta filter and compare the taste. Incredibly, you’ll notice the difference.

giphy-44

 

I personally think that data is a lot like vodka. It’s not just that I deal with both on a daily basis, or that they have a tendency to give you a nasty headache, but that if you want good quality data, you can’t just brew it and serve it. It needs attention, and it needs to be as pure and perfect as you can get it.

 

There is of course a huge difference between filtering data and removing data you don’t agree with. Filtering isn’t about ensuring that your hypothesis is met, or guaranteeing that people answered how you want; it’s about knowing that the answers you have are honest and consistent.

 

One of the main problems is respondent-fatigue. At the start of the survey, a respondent can be enthusiastic, and willing to give their opinion in great depth, taking care on each question to answer it truthfully. However, after a certain amount of time, it’s not unusual for some people to just get a bit bored. To an extent, we can try to keep them entertained with varied question types and stylish visuals, a few may still decide to rebel, and answer long grid questions in fewer seconds than it would take to even read the statements, and sometimes click the same rating for every single one of them. Then when they reach a text question, since they can’t just hit a random button on the page, they hit a few random buttons on their computer keyboard instead. Alternatively, they might skim through the question and answer in a way which doesn’t make any sense.

giphy-48

 

We identify these bad responses using a series of algorithms and checks which we continue to improve each year, and we make sure that no such thing makes it to the final data. Mixing these random and unreliable answers in with the rest of the responses can eliminate the statistical significances when comparing demographic groups, and water down the findings. If impurities in your vodka bother you, imagine watered-down vodka. And remember, this isn’t some shot you’re being offered at a party for free – this is a high quality spirit that you have paid for.

 

With us, you don’t get mixers, you don’t get sediment, and you don’t get that awful aftertaste; we’ll give you your data neat, pure, transparent and delicious. Maybe next time you’ll order a double.

giphy-49

 

Isaac Lockett    Isaac Lockett