Ordered Categorical Models

Andrew Fogarty

8/22/2019

# load necessary packages
library(dplyr)
library(plotly)
library(stargazer)
library(knitr)
library(MASS)
library(arm)
library(nnet)
set.seed(123)

1 Introduction

In this analysis, we will use survey data collected from British voters. They survey was conducted in 2010 which asked respondents about their vote intention. The Labour party was led by Prime Minister Gordon Brown. The Liberal Democrats, a centrist party on the economic left-right dimension, was led by Nick Clegg. The conservative party, led by David Cameron, ended up winning the largest number of votes and seats in the 2010 general election. The outcome variable is represents voter opinion on whether or not the MPs caught in the expenses scandal should resign.

2 Data

df <- read.table('https://raw.githubusercontent.com/afogarty85/replications/master/Expenses%20Scandal/ps206data3.txt',
                 header = TRUE)

2.1 Data: Variable Descriptions

While the codebook for this survey is unavailable, we list several of the variables used in this analysis and their meaning.

2.2 Data: Summary

We begin by inspecting the survey data, noting that the dataset contains no missing values and is complete.

##    app_Brown       app_Cameron       app_Clegg         persfin     
##  Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Min.   :1.000  
##  1st Qu.: 2.000   1st Qu.: 4.000   1st Qu.: 4.000   1st Qu.:2.000  
##  Median : 5.000   Median : 6.000   Median : 5.000   Median :2.500  
##  Mean   : 4.541   Mean   : 5.417   Mean   : 4.757   Mean   :2.535  
##  3rd Qu.: 7.000   3rd Qu.: 7.000   3rd Qu.: 6.000   3rd Qu.:3.000  
##  Max.   :10.000   Max.   :10.000   Max.   :10.000   Max.   :5.000  
##     natecon        affect_fc        app_Afgh        MP_resign    
##  Min.   :1.000   Min.   :1.000   Min.   :0.0000   Min.   :1.000  
##  1st Qu.:1.000   1st Qu.:2.000   1st Qu.:0.0000   1st Qu.:1.000  
##  Median :2.000   Median :2.000   Median :0.0000   Median :1.000  
##  Mean   :2.047   Mean   :2.501   Mean   :0.3216   Mean   :1.504  
##  3rd Qu.:3.000   3rd Qu.:3.000   3rd Qu.:1.0000   3rd Qu.:2.000  
##  Max.   :5.000   Max.   :4.000   Max.   :1.0000   Max.   :3.000  
##      gender            age           rent_own        memb_union    
##  Min.   :0.0000   Min.   :18.00   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:39.00   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.0000   Median :53.00   Median :0.0000   Median :0.0000  
##  Mean   :0.4733   Mean   :52.64   Mean   :0.2549   Mean   :0.2318  
##  3rd Qu.:1.0000   3rd Qu.:66.00   3rd Qu.:1.0000   3rd Qu.:0.0000  
##  Max.   :1.0000   Max.   :97.00   Max.   :1.0000   Max.   :1.0000  
##      income        vote_intent   
##  Min.   : 1.000   Min.   :1.000  
##  1st Qu.: 3.000   1st Qu.:1.000  
##  Median : 6.000   Median :2.000  
##  Mean   : 6.853   Mean   :1.826  
##  3rd Qu.:10.000   3rd Qu.:2.000  
##  Max.   :15.000   Max.   :3.000

Because we have an ordered categorical dependent variable, we need to set its order in an intuitive manner.

## NULL

2.3 Data: Dependent Variable

We can see that we have a categorical dependent variable that we are treating as ordered such that: No < Maybe < Yes. The barplot below shows that most respondent’s expect the MPs who engaged in the expenses scandal to resign.

2.4 Data: Independent Variable

Our primary independent variable in this analysis is app_Brown, or PM Gordon Brown’s approval. The histogram below shows the broad distribution of survey respondents and how they view PM Gordon Brown’s performance. We can see that while a fair amount of respondents disapprove of PM Brown, voters vary considerably in their perceptions of the PM’s performance.

3 Analysis

We begin by fitting our ordered logit model. It is important to note the use of as.ordered in our model, to ensure that R treats the factor accordingly. It is almost important to note that we need to set Hess to TRUE when we use polr to ensure the model fits. Our estimated model can be written as the following:

\[\begin{equation} P(Y \leq j) = logit^{-1}(\beta_{j0} - (\beta_{1}\text{PM Approval} + \beta_{2}\text{Age} + \beta_{3}\text{Income} + \beta_{4}\text{Gender})) \end{equation}\]

# Estimate Ordered Logit
ologit <- polr(as.ordered(MP_resign) ~ app_Brown + 
                 age + income + factor(gender), 
               method="logistic", 
               data = df, 
               Hess = TRUE)

We print our coefficients here largely to inspect the sign and significance of the results. We can broadly interpret some results as follows:

stargazer(ologit,
          type = "html",
          title = "Politician Approval and Voter Preferences Over Resignation",
          dep.var.labels = "Should MPs Resign?",
          covariate.labels = c("PM Brown's Approval", "Age", "Income", 
                               "Female"),
          ci = TRUE)
Politician Approval and Voter Preferences Over Resignation
Dependent variable:
Should MPs Resign?
PM Brown’s Approval -0.090***
(-0.143, -0.038)
Age 0.014***
(0.005, 0.023)
Income -0.028
(-0.065, 0.009)
Female -0.269*
(-0.559, 0.021)
Observations 824
Note: p<0.1; p<0.05; p<0.01

3.1 Analysis: Simulations

To generate the type of data we really want, we run simulations to generate predicted probabilities and to generate confidence intervals. We begin by using the sim function to generate 10000 coefficients taken from our ologit model. Then we sequence out our primary independent variable app_Brown, so we can compare predicted probabilities to its value. Then, we create a model which holds all other excoriates at its mean or mode, while allowing the primary independent variable to vary. Finally, we matrix multiply our model by the simulated coefficients, excluding the outpoints generated by our model.

##       No|Maybe  Maybe|Yes   app_Brown         age       income
## [1,] -1.345964 -0.5704596 -0.08084540 0.016826215 -0.009579079
## [2,] -1.501023 -0.6828062 -0.07977511 0.011401395 -0.017251913
## [3,] -2.049607 -1.3571039 -0.11775848 0.011145727 -0.048255646
## [4,] -1.556534 -0.8411165 -0.11978193 0.015217360 -0.024087586
## [5,] -1.593084 -0.8350056 -0.06462041 0.009589238 -0.016450531
## [6,] -2.118487 -1.3936950 -0.10613395 0.005652174 -0.049048821
##      factor(gender)1
## [1,]      -0.5884050
## [2,]      -0.2366486
## [3,]      -0.4603893
## [4,]      -0.1912165
## [5,]      -0.3059874
## [6,]      -0.4952365

Next, we need to generate our threshold outpoints between the likelihood of voters choosing No over Maybe, and Maybe over Yes. We do this by slicing out the model’s estimate for the cutpoint and then calculating out the thresholds accordingly.

We can calculate the average predicted probabilities in a simple table like so:

##          No  Maybe    Yes
## Mean 0.1722 0.1415 0.6863
## SD   0.0812 0.0391 0.1165

And then we can compare our results to the predicted probabilities determined by the predict function.

##    app_brown_sq        No     Maybe       Yes
## 1             0 0.1065733 0.1090510 0.7843756
## 2             1 0.1154911 0.1158137 0.7686952
## 3             2 0.1250507 0.1227147 0.7522346
## 4             3 0.1352805 0.1297131 0.7350064
## 5             4 0.1462074 0.1367617 0.7170309
## 6             5 0.1578557 0.1438078 0.6983365
## 7             6 0.1702470 0.1507929 0.6789601
## 8             7 0.1833991 0.1576540 0.6589469
## 9             8 0.1973257 0.1643237 0.6383506
## 10            9 0.2120352 0.1707318 0.6172330
## 11           10 0.2275304 0.1768063 0.5956633
## [1] 0.6949848

We can see that our calculations produced near identical results to that of the predict function.

## [1] 0.6862852 0.6949848

3.2 Analysis: Predicted Probability Graphs

The graph below shows the average predicted probability for a voter who believes MPs should not resign at each level of PM Gordon Brown’s approval. We interpret the results as follows:

The graph below shows the average predicted probability for a voter who believes MPs should maybe resign at each level of PM Gordon Brown’s approval. We interpret the results as follows:

Finally, the graph below shows the average predicted probability for a voter who believes MPs should resign at each level of PM Gordon Brown’s approval. We interpret the results as follows:

4 First Differences

First differences calculations estimate the differences between two levels of our dependent variable while holding all other variables constant.

# Calculate First Differences Along P(Not Resign)-P(Maybe resign)
first.difference <- probc1 - probc2

# Calculate First Differences Along P(Maybe Resign - Yes Resign)
first.difference2 <- probc2 - probc3

# Calculate First Differences Along P(Resign - Should Not Resign)
first.difference3 <- probc3 - probc1

Ultimately, we see little difference among those who think that the MPs should not resign and maybe resign. The large differences in opinions occur between those who think the MPs should resign and those who believe that they should not.

The panel below shows the difference between those who believe the MPs implicated in the expense scandal should not resign and those that believe they should maybe resign. The average difference between these two outcomes is 0.03%

x4 <- list(title = "PM Brown's Approval", dtick = 1, zeroline = FALSE)
y4 <- list(title = "P(Should Not Resign - Maybe Resign)")

plot_ly(type = 'scatter', mode = 'lines') %>%
  add_trace(x =~ app_brown_sq, y =~apply(first.difference, 1, quantile, .5), 
            line = list(color='rgb(125,100,125)'), name = 'P(Should Not Resign - Maybe Resign)') %>%
  add_trace(x =~app_brown_sq, y =~apply(first.difference, 1, quantile, .975), 
            line = list(color = 'transparent'), name = 'Upper CI', showlegend = FALSE) %>%
  add_trace(x =~app_brown_sq, y =~apply(first.difference, 1, quantile, .025), fill = 'tonexty', 
            fillcolor='rgba(125,100,125,0.2)', 
            line = list(color = 'transparent'), name = 'Lower CI', showlegend = FALSE) %>%
  layout(xaxis = x4,
         yaxis = y4)

The panel in below shows the difference between those who believe the MPs implicated in the expense scandal should maybe resign and those that believe that they should resign. The average difference between voters who believe that the MPs should maybe resign and voters who think that the MPs should resign is approximately 0.54%.

x5 <- list(title = "PM Brown's Approval", dtick = 1, zeroline = FALSE)
y5 <- list(title = "P(Maybe Resign - Yes Resign)")

plot_ly(type = 'scatter', mode = 'lines') %>%
  add_trace(x =~ app_brown_sq, y =~apply(first.difference2, 1, quantile, .5), 
            line = list(color='rgb(0,100,80)'), name = 'P(Maybe Resign - Yes Resign)') %>%
  add_trace(x =~app_brown_sq, y =~apply(first.difference2, 1, quantile, .975), 
            line = list(color = 'transparent'), name = 'Upper CI', showlegend = FALSE) %>%
  add_trace(x =~app_brown_sq, y =~apply(first.difference2, 1, quantile, .025), fill = 'tonexty', 
            fillcolor='rgba(0,100,80,0.2)', 
            line = list(color = 'transparent'), name = 'Lower CI', showlegend = FALSE) %>%
  layout(xaxis = x5,
         yaxis = y5)

Finally, the panel below shows the difference between those who believe the MPs implicated in the expense scandal should resign and those that believe that they should not resign. The average difference between voters who believe that the MPs should resign and voters who think that the MPs should not resign is approximately 0.51%.

x6 <- list(title = "PM Brown's Approval", dtick = 1, zeroline = FALSE)
y6 <- list(title = "P(Yes Resign - Should Not Resign)")

plot_ly(type = 'scatter', mode = 'lines') %>%
  add_trace(x =~ app_brown_sq, y =~apply(first.difference3, 1, quantile, .5), 
            line = list(color='rgb(255,100,80)'), name = 'P(Yes Resign - Should Not Resign)') %>%
  add_trace(x =~app_brown_sq, y =~apply(first.difference3, 1, quantile, .975), 
            line = list(color = 'transparent'), name = 'Upper CI', showlegend = FALSE) %>%
  add_trace(x =~app_brown_sq, y =~apply(first.difference3, 1, quantile, .025), fill = 'tonexty', 
            fillcolor='rgba(255,100,80,0.2)', 
            line = list(color = 'transparent'), name = 'Lower CI', showlegend = FALSE) %>%
  layout(xaxis = x6,
         yaxis = y6)

4.1 Analysis: Compare Non-Nested Models

We theoretically chose to use a ordered logit instead of a multinomial because we believed there was a natural ordering to the dependent variable such that: No < Maybe < Yes. We can test our intuitive ordering by generating a multinomial model and comparing its AIC and BIC scores to see which fits the data best.

## # weights:  18 (10 variable)
## initial  value 905.256526 
## iter  10 value 732.404635
## iter  20 value 690.314405
## final  value 690.314391 
## converged

We use AIC and BIC to test the relative fit of these two models to our data instead of a Likelihood Ratio Test because the Likelihood Ratio Test is used for differences between nested models. We can see from the results below that our ordered logit fits the data much better.

## [1] 1400.629
## [1] 1447.77
## [1] 1394.565
## [1] 1422.85