Clinton vs. Sanders (Nigel Paray for CNN)

Much has been made about Bernie Sanders’ poor performance with minorities so far in the democratic primaries (see here and here). Indeed, an analysis by ABC news of exit polls of all democratic primaries so far reveals that only 15% of black voters and 36% of Hispanics have voted for Sanders (vs. 83% and 63%, respectively, for Clinton); by contrast, Sanders has picked up 48% of the white vote (vs. 50% for Clinton).

Sanders supporters acknowledge this fact, but counter this by pointing to a more favorable landscape going forward in remaining states, especially post March 22^{nd}. Although this point is true (as we’ll see below), the question remains: is it favorable *enough* for Sanders to win the nomination. As such, we’re interested in answering the following questions:

- Does the “racial makeup” of a state (i.e. White / Black / Hispanic demographic split) have any value in predicting the pledged delegate vote?
- If it does, what do Sanders and Clinton’s delegate projections look like for the remaining races?
- What would it take for Sanders to win the pledged delegates race?

In order to answer the first question, we regress pledged delegates won by Clinton vs. Sanders for primaries on March 1st and before against the “racial makeup” of those states and check the quality of the fit. The purpose of the regression is to find a set of optimal coefficients that, when multiplied by the “racial demographics” of each state and then by the number of delegates available for that state, result in a set of calculated pledged delegates that match as closely as possible the actual delegates won by each candidate for states that have already voted. The results of the regression are listed in Table 1:

Table 1. Actual pledged delegates won by Clinton vs. Sanders compared against calculated pledged delegates from regression model

The quality of the fit can intuitively be appreciated by comparing the results of the regression (columns labeled “Projected Delegates”) vs. number of delegates actually won (columns labeled “Delegates Won”) on a state by state level. The closer the regression results are to the actuals (i.e. the closer the “Projected Delegates Delta” columns are to 0), the better the fit. Even though the regression is not perfect (in the sense that the delta between our regression results and the actuals is not 0 for each state), the deviations are relatively minor. Mathematically, this intuition can be assessed more formally by checking the R^{2} coefficient of the regression (there are other ways as well). The closer this coefficient is to 1, the better the regression is. Given that our R^{2} coefficient is 0.99 for Clinton and 0.965 for Sanders (the adjusted R^{2} for Clinton is 0.986 and 0.952 for Sanders), we can conclude that this a good regression (see here and here for a good introduction to regression). This implies that there is a strong correlation between candidate preference and the “racial make-up” of a state, at least based on the states that voted on March 1st and before. Although this strong correlation doesn’t necessarily imply anything causal *on its own *(in the sense that this could be a spurious correlation), there seems to be a lot of independent evidence that this is indeed a meaningful correlation.

And what of the regression coefficients that yielded the above results? It will come as no surprise that the coefficients reflect what we expected, which is to say that minorities play a very important part of Clinton’s success, while the White vote explains most of Sanders’ success. For completeness, we list the coefficients here:

In order to answer the second question, we multiply the regression coefficients by the “racial makeup” of future states and then by the pledged delegates available for those future states to project future primaries. Our total projected delegates for these future states is presented in table 2:

Table 2. Projected pledged delegates from regression model

If we look at the projections based on our regression method, we see that Sanders performs better than Clinton in states with a large fraction of White voters, while Clinton does better in more diverse states. Overall, we expect that Clinton will win 2321 delegates (57.3% of total pledged delegates) vs. Sanders’ 1730 delegates (42.7%). As such, we can see that Sanders loses the pledged delegates vote handily.

The third and final question can now be asked: what would it take for Sanders to win? We choose to answer this question by asking a proxy question: what would the candidate preference by White / Black / Hispanic voters have to be *going forward* for Sanders to win the nomination?

There are multiple ways to do this, and we run three different scenarios. For all scenarios, we assume that the future popular vote won is proportional to the number of pledged delegates won; this is a reasonable assumption, on average, as the Democratic primaries proportionally allocate their delegates once a 15% threshold of the vote is met:

- Assuming minorities continue voting as they have in states that have already voted, Sanders would have to win about 70% of the White vote going forward. Given that Sanders has only managed to win 48% of the White vote so far, expecting him to win an additional 20%+ going forward seems implausible.
- Assuming Sanders continues to capture the same amount of the White vote as he has, he would need to capture 70% of the Black and Hispanic vote to win the nomination. This seems even more implausible given his current percentages with minorities.
- Assuming Sanders captures 60% of the White vote (a +12% increase vs. today), he would need to capture 50% of the Black and Hispanic vote (a +36% and +14% increase vs. today, respectively) to win the nomination. As such, even with such a drastic increase in the White vote captured, Sanders would somehow have to triple his percentage with Black voters, and significantly increase his support with Hispanics. Again, this simply does not seem realistic.

Note that throughout this analysis, we have ignored super-delegates (who favor Clinton). One of the interesting conclusions resulting from this exercise is that talk of super-delegates in this race is superfluous: unless Clinton gets forced out of the race (because she gets indicted, imprisoned, or some other far-fetched scenario) Sanders is extremely unlikely to win the popular vote, because of demographics.

__Some notes:__

“Racial makeup” in red in Tables 1 and 2 come from 2016 democratic primary exit polls; in black from 2008 democratic primary exit polls (note that this helps Sanders as minorities, which Clinton wins handily, have increased their percentages over the past 8 years); in blue for Florida from the latest 2016 democratic primary poll available; in green from the “racial make-up” of the state as a whole (no better data was available) based on, for the most part, the 2010 census; in gray for Democrats Abroad based on an assumption that neither candidate is favored given lack of information.