Friday, 25 October 2013

Initial responses to comments so far on the long-range election forecasting post

Thanks for all the various comments on my initial post about long-range forecasting from historical polls and votes relationship

Here are some initial responses to comments and questions that have been made on Twitte and on the blog. I've grouped them according to theme and pasted them unedited. I haven't copied everything but I have tried to take examples of all points/queries. Sorry if I've inadvertently left something out but I haven't deliberately done so.

The response to the first set is an apology for something I should have sorted out and will try to resolve. The rest are attempts at explanation, clarification and hopefully helpful responses. Apart from the first three the topics are in no particular order.

Incompatible probabilities:
  • Pr(Lab majority) = 15% & Pr(Lab largest party) = 12% are not compatible.
  • How can Lab have a bigger chance of a majority than being the largest party?
Fair cop. These figures are logically incompatible. Clearly I need to revise the method to ensure these are fully consistent, and I will do. For computational reasons the method does not currently estimate a full uncertainty distribution over all possible seat outcomes which is necessary for guaranteed strictly consistent estimates. If you read the paper you'll see that, at present, the probabilities for Con majority, Lab majority and difference between Con and Lab are generated from three different assumed normal distributions from the prediction intervals (for Con seats, Lab seats and the difference respectively). These are not strictly defined to be fully compatible. Much of the time this shouldn't matter but did on in this situation and I should have at least flagged this up and discussed it or solved the problem before publishing. Sorry.

It might take a bit of time to resolve the problem but unlikely to make much different to the main story - that the probabilities for Labour largest or Labour majority are small.  Most likely direction of change with a different method is that the probability of Lab largest party will be revised up, but I would be surprised if it became more than 25%. 

All the probabilities come from approximate methods so best just look at them as rough estimates. 

Conflict of interest?
  • Someone got paid from CCHQ?
No - I'm not paid by any political party and have never made a political bet. No conflict of interest issues here at all.

What about UKIP?
  • Seems odd that there's no mention of UKIP. Not convinced this is best use of regression ever.
  • Does this include #UKIP causing Conservatives in marginal seats losing to Labour?
  • No Ukip MPs?
  • No serious prediction can be made without factoring in Ukip They will take Labour as well as Conservative votes
  • This model is flawed as it takes no account of UKIP's rise in popularity (even allowing for a possible tail off at the GE).
UKIP are factored in indirectly but as a young party we can't model their historical experience of polls and election results in the same way as for the main parties. UKIP current standing is reflected in the input share of the vote, which has the Others on 18. Predicted Other share in 2015 is 16.2, which you could say is likely to be mostly UKIP. Apart from Nationalists, the seats prediction assumes that the largest other party in any constituency will get the full amount of the predicted 2010-15 change in Other share. Despite this, UKIP are still not predicted to win any seats - they are just too far behind.

The model doesn't allow for UKIP to benefit disproportionately from any one party. I can see this is a concern but note that Rob Ford, Matthew Goodwin, myself and others have been pointing out that UKIP take votes from Labour as well as the Tories, so the differential impact on Tories may be much more muted than many assume. 

Prediction is implausible?
  • Forgive my scepticism. No-one has polled over 40% in an election for 12 yrs, and the Tories haven't for 21.
  • those Con + Lab vote shares would be mirror of Blair vs Hague in 2001 
  • All 3 points seem logical and v probable.. but direction of travel seems too drastic to take seriously.
I agree that the prediction is remarkable, but think it is worth seeing what the data say. The patterns in previous cycles are surprisingly consistent. Paper discusses idea of weighting more recent elections more but that doesn't improve out-of-sample prediction substantially. I don't think it is a good idea to try to find the single previous election that seems to best characterise what you think is going on and rely on that. Better to pool information from a set of previous elections.

This time is different because of the coalition
  • History of Tory incumbency barely relevant because Coalition are incumbents not Tories.
  • Surely many historical precedents are not valid because we are in a unique situation - there will be no incumbent government defending its position. All parties will be campaigning on the basis of change.
  • Nor does it allow for the fact that a significant number of unhappy Lib-Dem voters will be registering their dis-satisfaction with the Coalition & voting Labour. As Mike Smithson says, this will be a unique election. Such generalised models based on previous voting patterns are, therefore, unlikely to be correct.
Maybe there are good reasons to believe that things are not likely to be so different. Research on coalition government electoral performance cross-nationally by myself and others suggests that the Tories are most likely to held responsible for government actions as the largest party in the coalition. So for want of a better reason to do differently I have effectively produced the forecast in the same way as if the Conservatives were in a single party government.  Equally I haven't taken any account of minority governments in the past. Too sporadic to model statistically.  The paper discusses how the coalition affect on Lib Dem and Labour support is factored in because it is already reflected in the polls.

Are past polls a good guide to the future?
  • Interesting data, but why assume that past polls are a good guide?
  • My preference wld be for "fundamentals" forecasts at this stage (economy, maybe PM approval)
I'm not assuming history of polls is definitely a good guide. I'm partly trying to work out how good a guide it is and the confidence intervals reflect estimates of how poor a guide current polls are. There is a meta level question as to whether the past is a good guide to future levels of uncertainty. The out-of-sample prediction section in the paper suggests that it has helped a bit in the past and so it may do again in the future. It may not work out but in trying to say something about the future I think it is better than not considering past experience.

The question of whether there are better bases for forecasting is a good one. My impression is that the "fundamentals" might be good to also consider later on, but at this stage you would be effectively predicting the future trajectory of the economy over quite a long time period; arguably more difficult.

Is this the best forecast we can do?
  • is your view that this model provides the most accurate available forecast, or just that these factors produce it?
I'm not aware of any others this far out apart from assuming that current polls are the best guide we have and projecting them forward assuming no change between now and the election. Part of the point of the paper is to consider the most likely direction and magnitude of change between now and the election. Past experience suggests that the method I'm talking improves prediction by about 30% over just assuming the polls will stay steady. (See paper.)

Regional and constituency factors
  • How's that going to happen without any Tory seats in a city north of Birmingham?
  • Also doubt past precedent is helpful either & believe Tories are losing the North as they lost Scotland
  • You results suggest a Tory lead of less than 10% resulting in comfortable majority. Which seems unlikely
  • boundaries used to favour Tories now favour Labour.
The seats prediction is off the back of 2010 constituency results and so respects the geographical performance of the parties then.  Uniform swing and similar haven't been perfect in the past but they have been sufficiently good so as to think that the seats prediction part is a small problem relative to prediction of the GB share of the vote. Bias in the system is clearly visible in simulations with other results.

Switching between parties
  • Secondly what is going to happen to the 2010 LD>CON switchers. In your model they seem to evaporate.
  • Third point the detail from almost all polls shows that very little of current LAB support comes from CON converts.
  • To what degree does the model take into account the point frequently made by Mike Smithson, that the loss of half the Lib Dem vote from 2010 has mostly gone to Labour, and there appears to be little sign that it will either go back to the Lib Dems, and very little sign that it will migrate to the Conservatives?
Model doesn't look at all at individual level switching, just overall levels of change. The predictions say nothing about who switches which way, just the net effect. Switching up to now is reflected in current poll levels which influence the forecast but just based on current levels of support, not origins or voting histories.  

Taking previous election result into account
  • Historical tendencies? Well you forgot the historical tendency that incumbent parties have only once increased their share of the vote since the 50s.. 40% for the Tories from a 2010 starting point of of 35%? What a crock of shite!
  • You are not taking into account that the GE to Mid-term fall this parliament is less than historical data

The model doesn't directly take the prior election result into account. It should influence things indirectly but affecting the levels of support in the polls in the mid term. I will explore further whether it adds value beyond current polls. What is being suggested in the first comment is actually a model of forecasting without any polls at all. It seems odd to ignore the most current information. I'm just arguing that we should look at with the perspective of how polls in the past have corresponded with future election results.

A long-range forecast for a 2015 British General Election based on current polls and historical polls and votes

I've just started publishing forecast from a polls based method for forecasting the next general election here. I'm hoping to update the forecast weekly. 

As I say on the website: "The methodology is described in a working paper. The approach is broadly to predict the next election based on current opinion polls and the track record of polls in previous electoral cycles allowing for change in opinion in the run up to the election. The method allows for three main phenomena: historical tendencies for the Conservatives to over perform and Labour to under perform their vote intention figures in the polls when it comes to election day; governments being more likely to recover and oppositions fall back in the run up to an election; and a tendency for parties to move back towards their long-run average level of support. All three suggest a Conservative recovery and a Labour set back from autumn 2013. The statistical regression methodology generates estimates of uncertainty and so prediction intervals (range of likely outcomes) and probabilities for key events are also provided below. The forecast represents a way to think about the implications of current opinion polls for the outcome of the next general election in light of the historical relationship between polls and election results. It is the product of a statistical analysis of the data and not my personal opinion about what will happen."

The first forecast is replicated below. There are two striking features. The first is that the Conservatives are forecast to do much better than they are currently polling, mainly because they are in government but also partly because they are historically at a relatively low point (from which regression to the mean effects suggests they should recover) and as a party they have tended to outperform polls at elections. Labour are predicted to do correspondingly worse and so the forecast for the top two is almost symmetrically opposite from the current polls: 40:32 instead of 33:38.

The second of the two most striking features is that the prediction intervals for shares of the vote are enormous.  For the main parties it is clear that there is more uncertainty in the Conservative vote but even the forecast Labour vote could be out by as much as 6.6 points: a huge political difference.  At first glance these prediction intervals may seem to encompass all foreseeable outcomes and more. For the Liberal Democrats the interval from nothing (a hard boundary that had to be invoked!) to 26% seems ridiculously large. While these may seem hilarious at first sight, remember that not all points within the intervals are equally likely to occur. Also as 95% forecast confidence intervals they reflect the historical variation in the votes for these parties and there should be only a 5% chance of a result outside the interval. So they are bound to be very broad to be credible. Even so, the lower bound for the Conservative forecast, at 28%, tells us that it is very unlikely the Conservatives will do much worse than they currently stand in the polls, which is informative. Similarly, the Labour prediction interval suggests it is extremely unlikely that Ed Miliband will do as well as Tony Blair in 1997 or 2001 but it would not be surprising (given polling in previous elections) if he did worse than Gordon Brown or Michael Foot. 

The forecast election-day seats are as you would expect them given the forecast shares. They are a probabilistic estimates of the kind used in election night forecasting. A classic uniform change prediction would produce slightly different figures but not by much, especially given the large prediction intervals for seats that follow from the large prediction intervals for votes.

The estimated probabilities for key outcomes are perhaps the most helpful feature of the forecast so far from an election. These show that the Tories have an 88% chance of being the largest party but only 57% chance of an overall majority. These are rather one sided at this stage in the cycle given there is plenty of scope for change in party support before the election. Even immediately before the election there is considerable uncertainty given the historic record of the polls as shown in Table 1 of the working paper.

Having some estimate for the probability of a hung parliament (currently 28%) is helpful to understand the operation of the electoral system. A Liberal Democrat recovery to May 2010 levels would increase the chances of a hung parliament to over 50%.

Date of forecast: 25.10.2013
Days till the election: 559

Inputted current average poll shares 
Con : 33
Lab : 38
LD  : 11

Forecast Election Day Shares and 95% Prediction Intervals
Con : 40.2 plus or minus 11.8 i.e. between 28 and 52
Lab : 31.8 plus or minus 6.6 i.e. between 25 and 38
LD  : 11.8 plus or minus 14.5 i.e. between 0 and 26

Forecast Election Day Seats
Con : 337
Lab : 265
LD  : 21
Con majority of 24

Forecast Election Day Seat approx. 95% Prediction Intervals
Assuming LD share at 11.8 and allowing Con and Lab to vary as per interval above.
Con between 219 and 471
Lab between 140 and 376
LD between 14 and 30

Probabilities of key outcomes
Pr(Con majority) = 57%
Pr(Lab majority) = 15%
Pr(Hung parliament) = 28%
Pr(Con largest party) = 88%
Pr(Lab largest party) = 12%