An outlying frame of prediction

The Guardian had another interesting article to present, it came online on Jan 1st, but I just read it a mere moment ago. The nice part that this is about data, it is a little bit more about statistics, but I am not a statistician, I am a Data Miner. The title ‘Alarmingly for pollsters, EU referendum poll results depend heavily on methods‘ gave me the jolt I needed (at http://www.theguardian.com/politics/2016/jan/01/eu-referendum-polling-results-depend-methods). From my point of view, the entire exercise is a failed event, no matter how you slice it. Before we go into the results, let’s take a quick look at the nations involved:

  1. UK, population 65,081,276
  2. France, population 67,063,000
  3. Germany, population 81,276,000
  4. Italy, population 60,963,000
  5. Spain, population 46,335,000
  6. Sweden, population 9,816,666
  7. Finland, population 5,475,000
  8. Denmark, population 5,673,000
  9. Portugal, population 10,311,000

Now look at two quotes: “It found strong support for the UK’s continuing membership, with an average of 53% of respondents favouring Britain’s continuing membership across nine other countries surveyed“, which might be fair enough, but then we get quote two, which is “Only in Norway, which is not a member of the European Union, would a slight plurality, of 34% to 27%, prefer to see the UK leave and join it outside the club“, this is interesting, because Norway is not one of the nine countries in the mix, which now implies that additional nations had been interviewed, so what happened, the others were less in favour?

Now we add the optional considerations “ICM also investigated the appetite in all these countries to call time on their own membership, in the event that their country staged an in/out referendum“, So ICM had another reasoning entirely, the ‘in the event that their country staged a referendum‘ is central to this, because that means that the questionnaire, the hypotheses and the methodology would be different from the get go, which is not even that central in my thinking process, but it is elemental to the entire event. Now, the question becomes whether this is all part of ICM Research a UK Market Research company, was it done as part of the umbrella called Creston Insight, or perhaps even a third part and I am linking the wrong ICM to the wrong company.

These are all valid considerations and in my case the assumption was done intentionally (and most likely to be correct).

You see the paragraph in the Guardian “Alarmingly for the polling industry, however, the result substantially depends on the method used. Nineteen of the 21 polls were done online, and among these the average advantage for remain shrivels to a dangerously slim two points. But the two telephone surveys that have been undertaken point to far bigger pro-EU leads of 17 and 21 points” shows the issue for me. The paragraphs result in the question, were 19 nations interviewed? If so, why are they not all mentioned, in another option, were two methodologies used in the nine countries? One via phone and one via online, which makes perfect sense, but then an even amount of polls should have been used. All the article does is wonder how reliable the approach is, and if at all, are politicians even interested in doing it fair and square?

You see, if the results can sway a lingering vote (which is a given fact) than we can see that the poll could be used to sway some to ‘follow’ the largest group (with a tie a much harder thing to influence), but influence is a given.

For me, the number one issue were none of these items, in my case it was the mention at the very end. The quote “ICM interviewed a representative sample of at least 1000 adults online in each of nine European countries on 15 and 30 November 2015. Interviews in each country have been weighted to the profile of adults living within it” this is the issue, because a sample of 1,000 can never ever be representative of a population of 81 million, not even representative of a population of 46 million, there is no amount of weighting that can give anything but the roughest of estimations. The more representative the sample is for households, the larger the interviewing sample needs to be. There might have been the slightest reliability if a sample of at least 10,000 was used per nation and I use the word ‘slightest’ in the most liberal of ways. The moment we introduce, gender, income and education 10,000 might not slice it either. You see, yes, weighting can be applied, but than a single response could represent a group of 50,000-100,000, how reliable do you think that one voice would be regarding the other 49,999-99,999?

1,000 might be budget based, but this would then reflect a budgeted population that holds no reliability at all.

Sampling can be a real science, but when we see frequency weighing to this amount, we can safely say that science has been replaced by educated guessing, which is not the way to go. Consider France for a moment. Consider that in regions people feel very different, the two regions where Le Pen are powerful, they will not be in favour of the EEC at all, the others regions might be (read: might be). Now consider that France has 22 administrative regions, so in fairness we get roughly 50 responses per region, 25 males and 25 ladies, so per education level en perhaps even per age group, how much remains? How representative are these 25 people for that region? Now consider that not every region has the same population, so the 50 people representing the 11 million that make up for get a very different weight from those representing the 4 million in Normandy. Are you catching on how utterly unreliable those numbers have become? And how is this done for the UK? Or did ICM decide to get in quick and fast so the capitals make up for the bulk of the votes, which in case of Sweden makes sense as the bulk lives in Stockholm, Goteborg or Malmo. So as there is a hint of truth that it might all be about methodology, the required setting can never be met by 1,000 responses per nation as I see it, in addition there is still the unlisted Norway. So ether the article made a few jumps (which could be fair enough) or the reference to ICM in all this should be answerable to a lot more questions than the article is currently giving.

I need to end this with one final quote: “if the huge differences between online and telephone surveys persist, one method or the other can expect to face a bruising referendum, because they cannot both be right“, from the parts I responded to, there is another option all together, neither are correct. They are not flawed, but wrong for the simple fact of sampling size and the quote given “in the event that their country staged an in/out referendum“, which means that there would have been a different hypothesis that needed answering and even then, the sample of 1,000 would never been have anywhere near useful.

A group of 9,000 can never be representative of a group surpassing a third of a billion that should be massively clear to anyone from the get go, even more so when you consider the different lifestyles and values held in Scandinavian nations versus most of Western Europe and that is just the tip of the statistical considerations.

Advertisements

Leave a comment

Filed under IT, Media, Politics, Science

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s