1 Introduction

In this analysis, we will use survey data collected from British voters. They survey was conducted in 2010 which asked respondents about their vote intention. The Labour party was led by Prime Minister Gordon Brown. The Liberal Democrats, a centrist party on the economic left-right dimension, was led by Nick Clegg. The conservative party, led by David Cameron, ended up winning the largest number of votes and seats in the 2010 general election. The outcome variable is represents voter opinion on whether or not the MPs caught in the expenses scandal should resign.

2 Data

df <- read.table('https://raw.githubusercontent.com/afogarty85/replications/master/Expenses%20Scandal/ps206data3.txt',
                 header = TRUE)

2.1 Data: Variable Descriptions

While the codebook for this survey is unavailable, we list several of the variables used in this analysis and their meaning.

MP_resign - Should MPs who were caught in the expenses scandal resign; (Yes, Maybe, No)
app_Brown - PM Gordon Brown’s Approval; higher = greater approval
age - Respondent’s age
income - Respondent’s income
gender - Respondent’s gender

2.2 Data: Summary

We begin by inspecting the survey data, noting that the dataset contains no missing values and is complete.

summary(df)

##    app_Brown       app_Cameron       app_Clegg         persfin     
##  Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Min.   :1.000  
##  1st Qu.: 2.000   1st Qu.: 4.000   1st Qu.: 4.000   1st Qu.:2.000  
##  Median : 5.000   Median : 6.000   Median : 5.000   Median :2.500  
##  Mean   : 4.541   Mean   : 5.417   Mean   : 4.757   Mean   :2.535  
##  3rd Qu.: 7.000   3rd Qu.: 7.000   3rd Qu.: 6.000   3rd Qu.:3.000  
##  Max.   :10.000   Max.   :10.000   Max.   :10.000   Max.   :5.000  
##     natecon        affect_fc        app_Afgh        MP_resign    
##  Min.   :1.000   Min.   :1.000   Min.   :0.0000   Min.   :1.000  
##  1st Qu.:1.000   1st Qu.:2.000   1st Qu.:0.0000   1st Qu.:1.000  
##  Median :2.000   Median :2.000   Median :0.0000   Median :1.000  
##  Mean   :2.047   Mean   :2.501   Mean   :0.3216   Mean   :1.504  
##  3rd Qu.:3.000   3rd Qu.:3.000   3rd Qu.:1.0000   3rd Qu.:2.000  
##  Max.   :5.000   Max.   :4.000   Max.   :1.0000   Max.   :3.000  
##      gender            age           rent_own        memb_union    
##  Min.   :0.0000   Min.   :18.00   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:39.00   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.0000   Median :53.00   Median :0.0000   Median :0.0000  
##  Mean   :0.4733   Mean   :52.64   Mean   :0.2549   Mean   :0.2318  
##  3rd Qu.:1.0000   3rd Qu.:66.00   3rd Qu.:1.0000   3rd Qu.:0.0000  
##  Max.   :1.0000   Max.   :97.00   Max.   :1.0000   Max.   :1.0000  
##      income        vote_intent   
##  Min.   : 1.000   Min.   :1.000  
##  1st Qu.: 3.000   1st Qu.:1.000  
##  Median : 6.000   Median :2.000  
##  Mean   : 6.853   Mean   :1.826  
##  3rd Qu.:10.000   3rd Qu.:2.000  
##  Max.   :15.000   Max.   :3.000

Because we have an ordered categorical dependent variable, we need to set its order in an intuitive manner.

df <- df %>% mutate(MP_resign = dplyr::recode(MP_resign, 
                         `1`="Yes",
                         `2`="Maybe",
                         `3`="No"))

levels(df$MP_resign) # outcome variable is not ordered!

## NULL

df$MP_resign <- factor(df$MP_resign, levels=c("No", "Maybe", "Yes")) # set factor order

2.3 Data: Dependent Variable

We can see that we have a categorical dependent variable that we are treating as ordered such that: No < Maybe < Yes. The barplot below shows that most respondent’s expect the MPs who engaged in the expenses scandal to resign.

dv_count <- df %>% group_by(MP_resign) %>% tally()

plot_ly(data = dv_count, type = 'bar') %>% 
  add_trace(x = ~MP_resign, y=~n, name = 'Should MPs Resign?')

2.4 Data: Independent Variable

Our primary independent variable in this analysis is app_Brown, or PM Gordon Brown’s approval. The histogram below shows the broad distribution of survey respondents and how they view PM Gordon Brown’s performance. We can see that while a fair amount of respondents disapprove of PM Brown, voters vary considerably in their perceptions of the PM’s performance.

plot_ly(data = df, type = 'histogram') %>% 
  add_trace(x = ~app_Brown,  name = 'PM Brown Approval',
            marker = list(color = 'rgb(49,130,189)'), nbinsx = 20)

3 Analysis

We begin by fitting our ordered logit model. It is important to note the use of as.ordered in our model, to ensure that R treats the factor accordingly. It is almost important to note that we need to set Hess to TRUE when we use polr to ensure the model fits. Our estimated model can be written as the following:

\[\begin{equation} P(Y \leq j) = logit^{-1}(\beta_{j0} - (\beta_{1}\text{PM Approval} + \beta_{2}\text{Age} + \beta_{3}\text{Income} + \beta_{4}\text{Gender})) \end{equation}\]

# Estimate Ordered Logit
ologit <- polr(as.ordered(MP_resign) ~ app_Brown + 
                 age + income + factor(gender), 
               method="logistic", 
               data = df, 
               Hess = TRUE)

We print our coefficients here largely to inspect the sign and significance of the results. We can broadly interpret some results as follows:

Increases in voter’s approval for PM Brown decreases the probability that voters will demand MPs caught in the expense scandal should resign.
As voters get older, they are more likely to demand that MPs caught in the expense scandal should resign.
Females, as compared to males, are less likely to demand that MPs caught in the expense scandal should resign.

stargazer(ologit,
          type = "html",
          title = "Politician Approval and Voter Preferences Over Resignation",
          dep.var.labels = "Should MPs Resign?",
          covariate.labels = c("PM Brown's Approval", "Age", "Income", 
                               "Female"),
          ci = TRUE)

**Politician Approval and Voter Preferences Over Resignation**

	Dependent variable:

	Should MPs Resign?

PM Brown’s Approval	-0.090^***
	(-0.143, -0.038)

Age	0.014^***
	(0.005, 0.023)

Income	-0.028
	(-0.065, 0.009)

Female	-0.269^*
	(-0.559, 0.021)


Observations	824

Note:	p<0.1; p<0.05; p<0.01

3.1 Analysis: Simulations

To generate the type of data we really want, we run simulations to generate predicted probabilities and to generate confidence intervals. We begin by using the sim function to generate 10000 coefficients taken from our ologit model. Then we sequence out our primary independent variable app_Brown, so we can compare predicted probabilities to its value. Then, we create a model which holds all other excoriates at its mean or mode, while allowing the primary independent variable to vary. Finally, we matrix multiply our model by the simulated coefficients, excluding the outpoints generated by our model.

sim.uk <- coef(sim(ologit, n = 10000)) #simulate coefficients
head(sim.uk)

##       No|Maybe  Maybe|Yes   app_Brown         age       income
## [1,] -1.345964 -0.5704596 -0.08084540 0.016826215 -0.009579079
## [2,] -1.501023 -0.6828062 -0.07977511 0.011401395 -0.017251913
## [3,] -2.049607 -1.3571039 -0.11775848 0.011145727 -0.048255646
## [4,] -1.556534 -0.8411165 -0.11978193 0.015217360 -0.024087586
## [5,] -1.593084 -0.8350056 -0.06462041 0.009589238 -0.016450531
## [6,] -2.118487 -1.3936950 -0.10613395 0.005652174 -0.049048821
##      factor(gender)1
## [1,]      -0.5884050
## [2,]      -0.2366486
## [3,]      -0.4603893
## [4,]      -0.1912165
## [5,]      -0.3059874
## [6,]      -0.4952365

app_brown_sq <- seq(min(df$app_Brown), # sequence primary IV by actual values
                    max(df$app_Brown), by = 1)

df_mean <- cbind(app_Brown = app_brown_sq, # vary IV
                 age = mean(df$age), # hold age at mean
                 income = mean(df$income), # hold income at mean
                 gender = 0) # gender set to mode

pred <- df_mean %*% # matrix multiply model that holds all covariates at mean/mode
  t(sim.uk[, 3:6]) # by the transpose of the simulated coefficient, not including cutpoints

Next, we need to generate our threshold outpoints between the likelihood of voters choosing No over Maybe, and Maybe over Yes. We do this by slicing out the model’s estimate for the cutpoint and then calculating out the thresholds accordingly.

threshold1 <- sim.uk[,1] # Pull No|Maybe cutpoint
threshold2 <- sim.uk[,2] # Pull Maybe|Yes cutpoint

probc1 <- plogis(threshold1 - pred) # P(No)
probc2 <- plogis(threshold2 - pred) - probc1 # P(Maybe)
probc3 <- 1 - probc2 - probc1 # P(Yes)

We can calculate the average predicted probabilities in a simple table like so:

# Results
means <- cbind(mean(probc1), mean(probc2), mean(probc3))
sds <- cbind(sd(probc1),sd(probc2),sd(probc3))
presults <- rbind(means,sds)
colnames(presults) <- c("No","Maybe","Yes")
rownames(presults) <- c("Mean","SD")
print(round(presults, digits=4))

##          No  Maybe    Yes
## Mean 0.1722 0.1415 0.6863
## SD   0.0812 0.0391 0.1165

And then we can compare our results to the predicted probabilities determined by the predict function.

# Predict Test
predict <- predict(ologit, newdata=df_mean, "probs" )

# Show Approval With Predictions Correctly
cbind(app_brown_sq, predict)

##    app_brown_sq        No     Maybe       Yes
## 1             0 0.1065733 0.1090510 0.7843756
## 2             1 0.1154911 0.1158137 0.7686952
## 3             2 0.1250507 0.1227147 0.7522346
## 4             3 0.1352805 0.1297131 0.7350064
## 5             4 0.1462074 0.1367617 0.7170309
## 6             5 0.1578557 0.1438078 0.6983365
## 7             6 0.1702470 0.1507929 0.6789601
## 8             7 0.1833991 0.1576540 0.6589469
## 9             8 0.1973257 0.1643237 0.6383506
## 10            9 0.2120352 0.1707318 0.6172330
## 11           10 0.2275304 0.1768063 0.5956633

# find the average for "MP_resign - Yes"
mean(predict[,3])

## [1] 0.6949848

We can see that our calculations produced near identical results to that of the predict function.

c(presults[1, 3], # by-hand calculation
  mean(predict[,3])) # predict function

## [1] 0.6862852 0.6949848

3.2 Analysis: Predicted Probability Graphs

The graph below shows the average predicted probability for a voter who believes MPs should not resign at each level of PM Gordon Brown’s approval. We interpret the results as follows:

For a voter who completely disapproves of PM Gordon Brown’s approval, the probability that the voter believes the MPs should not resign is approximately 0.10, on average, after controlling for age, income, and gender.

x1 <- list(title = "PM Brown's Approval", dtick = 1, zeroline = FALSE)
y1 <- list(title = "P(Should Not Resign)")

plot_ly(type = 'scatter', mode = 'lines') %>%
  add_trace(x =~ app_brown_sq, y =~apply(probc1, 1, quantile, .5), 
            line = list(color='rgb(125,100,125)'), name = 'P(Should Not Resign)') %>%
  add_trace(x =~app_brown_sq, y =~apply(probc1, 1, quantile, .975), 
            line = list(color = 'transparent'), name = 'Upper CI', showlegend = FALSE) %>%
  add_trace(x =~app_brown_sq, y =~apply(probc1, 1, quantile, .025), fill = 'tonexty', 
            fillcolor='rgba(125,100,125,0.2)', 
            line = list(color = 'transparent'), name = 'Lower CI', showlegend = FALSE) %>%
  layout(xaxis = x1,
         yaxis = y1)

The graph below shows the average predicted probability for a voter who believes MPs should maybe resign at each level of PM Gordon Brown’s approval. We interpret the results as follows:

For a voter who completely approves of PM Gordon Brown’s approval, the probability that the voter believes the MPs should maybe resign is approximately 0.17, on average, after controlling for age, income, and gender.

x2 <- list(title = "PM Brown's Approval", dtick = 1, zeroline = FALSE)
y2 <- list(title = "P(Should Maybe Resign)")

plot_ly(type = 'scatter', mode = 'lines') %>%
  add_trace(x =~ app_brown_sq, y =~apply(probc2, 1, quantile, .5), 
            line = list(color='rgb(0,100,80)'), name = 'P(Maybe Resign)') %>%
  add_trace(x =~app_brown_sq, y =~apply(probc2, 1, quantile, .975), 
            line = list(color = 'transparent'), name = 'Upper CI', showlegend = FALSE) %>%
  add_trace(x =~app_brown_sq, y =~apply(probc2, 1, quantile, .025), fill = 'tonexty', 
            fillcolor='rgba(0,100,80,0.2)', 
            line = list(color = 'transparent'), name = 'Lower CI', showlegend = FALSE) %>%
  layout(xaxis = x2,
         yaxis = y2)

Finally, the graph below shows the average predicted probability for a voter who believes MPs should resign at each level of PM Gordon Brown’s approval. We interpret the results as follows:

For a voter who scores PM Gordon Brown’s approval as a zero out of 10, the probability that the voter believes the MPs should resign is approximately 0.78, on average, after controlling for age, income, and gender.

x3 <- list(title = "PM Brown's Approval", dtick = 1, zeroline = FALSE)
y3 <- list(title = "P(Should Resign)")

plot_ly(type = 'scatter', mode = 'lines') %>%
  add_trace(x =~ app_brown_sq, y =~apply(probc3, 1, quantile, .5), 
            line = list(color='rgb(255,100,80)'), name = 'P(Should Resign)') %>%
  add_trace(x =~app_brown_sq, y =~apply(probc3, 1, quantile, .975), 
            line = list(color = 'transparent'), name = 'Upper CI', showlegend = FALSE) %>%
  add_trace(x =~app_brown_sq, y =~apply(probc3, 1, quantile, .025), fill = 'tonexty', 
            fillcolor='rgba(255,100,80,0.2)', 
            line = list(color = 'transparent'), name = 'Lower CI', showlegend = FALSE) %>%
  layout(xaxis = x3,
         yaxis = y3)

4 First Differences

First differences calculations estimate the differences between two levels of our dependent variable while holding all other variables constant.

# Calculate First Differences Along P(Not Resign)-P(Maybe resign)
first.difference <- probc1 - probc2

# Calculate First Differences Along P(Maybe Resign - Yes Resign)
first.difference2 <- probc2 - probc3

# Calculate First Differences Along P(Resign - Should Not Resign)
first.difference3 <- probc3 - probc1

Ultimately, we see little difference among those who think that the MPs should not resign and maybe resign. The large differences in opinions occur between those who think the MPs should resign and those who believe that they should not.

The panel below shows the difference between those who believe the MPs implicated in the expense scandal should not resign and those that believe they should maybe resign. The average difference between these two outcomes is 0.03%

x4 <- list(title = "PM Brown's Approval", dtick = 1, zeroline = FALSE)
y4 <- list(title = "P(Should Not Resign - Maybe Resign)")

plot_ly(type = 'scatter', mode = 'lines') %>%
  add_trace(x =~ app_brown_sq, y =~apply(first.difference, 1, quantile, .5), 
            line = list(color='rgb(125,100,125)'), name = 'P(Should Not Resign - Maybe Resign)') %>%
  add_trace(x =~app_brown_sq, y =~apply(first.difference, 1, quantile, .975), 
            line = list(color = 'transparent'), name = 'Upper CI', showlegend = FALSE) %>%
  add_trace(x =~app_brown_sq, y =~apply(first.difference, 1, quantile, .025), fill = 'tonexty', 
            fillcolor='rgba(125,100,125,0.2)', 
            line = list(color = 'transparent'), name = 'Lower CI', showlegend = FALSE) %>%
  layout(xaxis = x4,
         yaxis = y4)

The panel in below shows the difference between those who believe the MPs implicated in the expense scandal should maybe resign and those that believe that they should resign. The average difference between voters who believe that the MPs should maybe resign and voters who think that the MPs should resign is approximately 0.54%.

x5 <- list(title = "PM Brown's Approval", dtick = 1, zeroline = FALSE)
y5 <- list(title = "P(Maybe Resign - Yes Resign)")

plot_ly(type = 'scatter', mode = 'lines') %>%
  add_trace(x =~ app_brown_sq, y =~apply(first.difference2, 1, quantile, .5), 
            line = list(color='rgb(0,100,80)'), name = 'P(Maybe Resign - Yes Resign)') %>%
  add_trace(x =~app_brown_sq, y =~apply(first.difference2, 1, quantile, .975), 
            line = list(color = 'transparent'), name = 'Upper CI', showlegend = FALSE) %>%
  add_trace(x =~app_brown_sq, y =~apply(first.difference2, 1, quantile, .025), fill = 'tonexty', 
            fillcolor='rgba(0,100,80,0.2)', 
            line = list(color = 'transparent'), name = 'Lower CI', showlegend = FALSE) %>%
  layout(xaxis = x5,
         yaxis = y5)

Finally, the panel below shows the difference between those who believe the MPs implicated in the expense scandal should resign and those that believe that they should not resign. The average difference between voters who believe that the MPs should resign and voters who think that the MPs should not resign is approximately 0.51%.

x6 <- list(title = "PM Brown's Approval", dtick = 1, zeroline = FALSE)
y6 <- list(title = "P(Yes Resign - Should Not Resign)")

plot_ly(type = 'scatter', mode = 'lines') %>%
  add_trace(x =~ app_brown_sq, y =~apply(first.difference3, 1, quantile, .5), 
            line = list(color='rgb(255,100,80)'), name = 'P(Yes Resign - Should Not Resign)') %>%
  add_trace(x =~app_brown_sq, y =~apply(first.difference3, 1, quantile, .975), 
            line = list(color = 'transparent'), name = 'Upper CI', showlegend = FALSE) %>%
  add_trace(x =~app_brown_sq, y =~apply(first.difference3, 1, quantile, .025), fill = 'tonexty', 
            fillcolor='rgba(255,100,80,0.2)', 
            line = list(color = 'transparent'), name = 'Lower CI', showlegend = FALSE) %>%
  layout(xaxis = x6,
         yaxis = y6)

4.1 Analysis: Compare Non-Nested Models

We theoretically chose to use a ordered logit instead of a multinomial because we believed there was a natural ordering to the dependent variable such that: No < Maybe < Yes. We can test our intuitive ordering by generating a multinomial model and comparing its AIC and BIC scores to see which fits the data best.

mlogit <- multinom(MP_resign ~ app_Brown + income + age + gender, data=df, hessian=TRUE)

## # weights:  18 (10 variable)
## initial  value 905.256526 
## iter  10 value 732.404635
## iter  20 value 690.314405
## final  value 690.314391 
## converged

We use AIC and BIC to test the relative fit of these two models to our data instead of a Likelihood Ratio Test because the Likelihood Ratio Test is used for differences between nested models. We can see from the results below that our ordered logit fits the data much better.

# Multinomial Logit
AIC(mlogit) # 1400.629

## [1] 1400.629

BIC(mlogit) # 1447.77

## [1] 1447.77

# Ordered Logit
AIC(ologit) # 1394.565

## [1] 1394.565

BIC(ologit) # 1422.85

## [1] 1422.85

Ordered Categorical Models

Andrew Fogarty

8/22/2019