Multinomial Model Replications

Andrew Fogarty

8/11/2019

1 Introduction

How does the international system shape civil wars? Kalyvas and Balcells test whether or not there is a systemic explanation for the ways in which civil wars are fought, arguing that the Soviet Union was largely sponsoring rural-based insurgencies which employed irregular forms of warfare. By disaggregating civil wars into three types of conflicts (conventional, irregular, symmetric non-conventional (SNC)), the authors find a decline in irregular warfare following the end of the Cold War (Kalyvas and Balcells 2010, 418).

Since the authors used Stata, we will replicate and extend their findings in R, by offering graphical plots, confidence intervals on the estimated probabilities, and robustness checks. We use multinomial logits to primarily to model unordered categorical data.

rm(list = ls())
# load packages
library(dplyr)
library(ggplot2)
library(nnet)
library(MASS)
library(stargazer)
library(arm)
library(lmtest)
library(plotly)
library(car)
set.seed(1)
# load data
df <- read.csv('https://raw.githubusercontent.com/afogarty85/replications/master/Technologies%20of%20Rebellion/KB2010replicationdataset.csv')

df <- dplyr::select(df, c("technologyrebellion", "post1990",
                          "roughterrain", "ethnicwar", "gdpcapita_fl")) 
# select workhorse model variables

df <- na.omit(df) # drop missing data

2 Data

2.1 Dependent Variables

##    technologyrebellion      post1990   roughterrain             ethnicwar 
##  Conventional:46       Post-1990:38   Min.   :   0.00   Ethnic CW    :93  
##  Irregular   :74       Pre-1991 :99   1st Qu.:   6.00   Non-Ethnic CW:44  
##  SNC         :17                      Median :  12.60                     
##                                       Mean   :  34.76                     
##                                       3rd Qu.:  35.80                     
##                                       Max.   :1752.00                     
##   gdpcapita_fl  
##  Min.   :0.050  
##  1st Qu.:0.582  
##  Median :1.136  
##  Mean   :1.555  
##  3rd Qu.:2.048  
##  Max.   :6.243

The outcome variable (“Technology of Rebellion”) is an unordered categorical variable which has three values representing different forms of warfare: conventional, irregular, and symmetric non-conventional (SNC). This variable is measured by looking at the type of weaponry used by and the during the first year of conflict.

Conventional warfare emerges when rebels are able to militarily confront states using heavy weaponry such as field artillery and armor. In conventional wars, military confrontation is direct, either across well defined front lines or between armed columns. Conventional civil war takes place when the military technologies of states and rebels are matched at a high level (Kalyvas and Balcells 2010, 419).\

Irregular or guerrilla warfare is a technology of rebellion whereby the rebels privilege small, lightly armed bands operating in rural areas (Fearon and Laitin 2003, 75); it is an expression of relative asymmetry between states and rebels. Rebels have the military capacity to challenge and harass the state, but lack the capacity to confront it in a direct and frontal way. Irregular civil war emerges when the military technologies of the rebels lag vis-a-vis those of the state (Kalyvas and Balcells 2010, 418).\

This is the case when states are unable (or, in a few cases, unwilling) to deploy an organized military against poorly equipped insurgents. Neither incumbents nor insurgents used heavy weaponry. Often mistakenly described as guerrilla wars, SNC wars tend to arise in contexts characterized by extremely weak or collapsed states. We believe that the two categories of conventional and SNC war capture a real and important difference. SNC war is observed when the military technologies of states and rebels are matched at a low level (Kalyvas and Balcells 2010, 419). \

2.2 Primary Independent Variable

The primary independent variable, post1990, is a dummy variable which takes on two values, Pre-1991 and Post-1990. The authors identify the period of the Cold War as 1944-1990 and post-Cold War from 1991-2004 (the end of their dataset). The authors state that they established 1991 as the cutoff year because it corresponds to the dissolution of the Soviet Union and the emergence of several new states (Kalyvas and Balcells 2010, 423). This makes substantive sense as the official collapse of the Soviet Union occurred on December 26, 1991.\

3 Analysis

3.1 Hypothesis

Hypotheses are central to empirical work and unfortunately the authors do not present a hypothesis. In their absence, I have specified a reasonable one for them so that our analysis may be grounded in expectations.

\(H_{1}\): In a comparison of civil wars, those that started during the Cold War will be more likely to use irregular warfare than those that started after the Cold War.

3.2 The Model

This model is the author’s primary workhorse model and is used to derive their most important findings. They do include five other models which test different variables such as post-communist regimes (dummy), marxist insurgents (dummy), and (log) military personnel (continuous).

\[ \log(\frac{\pi_{i(J-1)}}{1-\pi_{iJ}}) = \beta_{0} + \beta_{1}\text{Post 1990}_{i} + \beta_{2}\text{Rough Terrain}_{i} + \beta_{3}\text{Ethnic War}_{i} + \beta_{4}\text{GDP Capita}_{i} \]

Before running their model, we have to relevel our dependent variable such that irregular warfare is our base factor for comparison to match the author’s work. We also make a few other alterations to the variable types to replicate the author’s results.

3.3 Data Cleanup

Following the mathematical representation above, we fit the model below. Our results are identical to those found by the authors in their Stata analysis.

3.4 Fit the Model

## # weights:  18 (10 variable)
## initial  value 150.509884 
## iter  10 value 114.191405
## iter  20 value 114.109066
## final  value 114.109064 
## converged
## 
## ==============================================
##                       Dependent variable:     
##                   ----------------------------
##                     Conventional       SNC    
##                         (1)            (2)    
## ----------------------------------------------
## roughterrain           0.004         -0.025   
##                       (0.008)        (0.017)  
##                                               
## ethnicwar              0.172         -0.245   
##                       (0.437)        (0.634)  
##                                               
## gdpcapita_fl           0.039         -0.468   
##                       (0.147)        (0.285)  
##                                               
## post1990              1.422***      2.756***  
##                       (0.490)        (0.660)  
##                                               
## Constant               -1.283        -0.973   
##                       (0.788)        (1.137)  
##                                               
## ----------------------------------------------
## Akaike Inf. Crit.     248.218        248.218  
## ==============================================
## Note:              *p<0.1; **p<0.05; ***p<0.01

Now that we have fit our model, what can we do with it?

3.4.1 Interpret the Coefficient’s Sign

Civil wars fought after 1990 increases the likelihood of conventional and symmetric non-conventional (SNC) warfare as compared to irregular warfare (the baseline).

3.4.2 Interpret the Intercept

A non-zero intercept indicates that the insurgents have some inherent propensity to use one form of warfare over another for reasons that are not captured in the model.

3.4.3 Interpret the Significance of the Coefficient

A statistically significant coefficient tells us that our data is extraordinary or the assumption that \(\hat{\beta}\) is 0 is wrong.

3.4.4 Likelihood Ratio Test

We can run a likelihood ratio test (LRT) to compare our model with another nested model. We can see from this LRT, the null model has a higher log-likelihood (i.e., null model fits our data worse) as compared to the fully-specified model. The difference between the two models is statistically significant which means that we can reject the null which says that these two models fit our data the same. We could swap our different variables to try different nested combinations if we wanted to.

## # weights:  6 (2 variable)
## initial  value 150.509884 
## final  value 131.254439 
## converged
## Likelihood ratio test
## 
## Model 1: technologyrebellion ~ roughterrain + ethnicwar + gdpcapita_fl + 
##     post1990
## Model 2: technologyrebellion ~ 1
##   #Df  LogLik Df  Chisq Pr(>Chisq)    
## 1  10 -114.11                         
## 2   2 -131.25 -8 34.291  3.598e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

3.4.5 Predicted Probabilities

What we really want from our model is predicted probabilities. We generate predicted probabilities by sequencing our primary independent variable while holding all other factors constant at their mean. This is a three step process.

3.4.6 Predicted Probabilities: Step 1

Sequence the independent variable along all of its possible values.

3.4.7 Predicted Probabilities: Step 2

Prepare hypothetical data to generate predictions. While we could create some out-of-sample data just as easily, we are going to hold all of our variables at the mean while allowing our primary independent variable to vary. It is important the model’s variable order is followed (i.e., post1990 is the last variable in mlogit1, thus it is last here).

3.4.8 Predicted Probabilities: Step 3

Lastly, we interpret the results. Returning to our hypothesis, we stated that: In a comparison of civil wars, those that started during the Cold War will be more likely to use irregular warfare than those that started after the Cold War.

We find strong support for our hypothesis. If a civil war started during the Cold War, it had a 0.66% probability of being fought with irregular warfare, on average, after controlling for rough terrain, ethnic war, and GDP per capita. The first difference between the two time periods is roughly -0.39% (0.27 - 0.66). One limitation with this basic form of predicted probabilities is it does not yield confidence intervals for our estimates. We will fix this problem later.

##   sqpost1990 Irregular Conventional        SNC
## 1          0 0.6645157    0.3017240 0.03376038
## 2          1 0.2715208    0.5112961 0.21718315

We can graph the results for \(H_{1}\) like so:

3.4.9 Predicted Probabilities: By-Hand

We can also calculate the predicted probabilities by-hand relatively easily. The point of this exercise is to show that the mathematics is not all that complicated.

\(\pi_{i1} = \frac{{1}}{{1+exp(-1.283 + 0.004*34.76 + 0.172*1.67 + 0.039*1.55 + 1.422*0) + exp(-0.973 + -0.025*34.76 + -0.245*1.67 + -0.468*1.55 + 2.756*0)}}\)

What we are doing below is we are multiplying our model coefficients by their sample means while specifying pre1991 for our primary independent variable, represented by a 0. This is because we want to derive the predicted probability for technologyrebellion during the Cold War (pre1991 == 0).

## [1] 0.6657894

3.5 Simulations - Generate Confidence Intervals

To generate confidence intervals for predicted probabilities, we need to use other packages which involve simulations. We will go through the process in steps.

3.5.1 Simulations: Step 1

Our first step involves taking 50000 draws from a normal distribution which is feasible given that asymptotic normality is a maximum likelihood property. Simulation allows us to account for uncertainty when making predictions.

If we run the summary command on our simulation results, we can see that the mean for each coefficient is very close to the estimate found in our model.

##   (Intercept)       roughterrain         ethnicwar      
##  Min.   :-4.4160   Min.   :-0.031501   Min.   :-1.6272  
##  1st Qu.:-1.8159   1st Qu.:-0.001365   1st Qu.:-0.1234  
##  Median :-1.2814   Median : 0.004121   Median : 0.1749  
##  Mean   :-1.2821   Mean   : 0.004087   Mean   : 0.1726  
##  3rd Qu.:-0.7503   3rd Qu.: 0.009507   3rd Qu.: 0.4675  
##  Max.   : 1.8138   Max.   : 0.038225   Max.   : 2.2536  
##   gdpcapita_fl         post1990        (Intercept)       roughterrain     
##  Min.   :-0.54848   Min.   :-0.5472   Min.   :-5.3825   Min.   :-0.10672  
##  1st Qu.:-0.05997   1st Qu.: 1.0925   1st Qu.:-1.7390   1st Qu.:-0.03665  
##  Median : 0.03896   Median : 1.4172   Median :-0.9682   Median :-0.02499  
##  Mean   : 0.03899   Mean   : 1.4215   Mean   :-0.9695   Mean   :-0.02499  
##  3rd Qu.: 0.13824   3rd Qu.: 1.7512   3rd Qu.:-0.2000   3rd Qu.:-0.01337  
##  Max.   : 0.68102   Max.   : 3.4161   Max.   : 3.6402   Max.   : 0.05041  
##    ethnicwar        gdpcapita_fl        post1990     
##  Min.   :-2.7944   Min.   :-1.6048   Min.   :0.2053  
##  1st Qu.:-0.6704   1st Qu.:-0.6613   1st Qu.:2.3074  
##  Median :-0.2415   Median :-0.4687   Median :2.7546  
##  Mean   :-0.2441   Mean   :-0.4680   Mean   :2.7531  
##  3rd Qu.: 0.1824   3rd Qu.:-0.2752   3rd Qu.:3.1978  
##  Max.   : 2.3732   Max.   : 0.7184   Max.   :5.3810

3.5.2 Simulations: Step 2

Next we want to divide our simulated data based on our dependent variable categories. So we slice our sim.coefs object such that the coefficients representing the conventional warfare are separated from those representing SNC warfare.

##      (Intercept)  roughterrain  ethnicwar gdpcapita_fl  post1990
## [1,]  -0.5072747  0.0020486750 -0.1728692  -0.05297848 1.6632390
## [2,]  -1.6162848  0.0139135066  0.3023828   0.04815281 1.3052129
## [3,]  -0.3307171 -0.0074206379 -0.4024245   0.06016678 1.7130657
## [4,]  -1.1659948  0.0009956614  0.1629752  -0.07448738 1.9347810
## [5,]  -1.7503020  0.0089141194  0.4255295   0.32067090 0.5678801
## [6,]  -1.8468763  0.0104484757  0.2999280   0.15324356 1.6133587
##      (Intercept) roughterrain  ethnicwar gdpcapita_fl post1990
## [1,]  -0.4107412 -0.032185375 -0.2986586   -0.5676961 2.157556
## [2,]  -0.9547440 -0.046474332  0.1545530   -0.5919985 2.435404
## [3,]  -0.4150132 -0.012417593 -0.9685103   -0.2842953 3.185353
## [4,]  -3.0938449 -0.001394229  0.4788915   -0.2610837 3.360624
## [5,]  -1.3545490  0.006198947 -0.4718130   -0.3335254 3.189834
## [6,]   0.2076437 -0.021563892 -0.7154925   -0.2995974 1.706272

Then, we want to specify two data frames that specifies our model which holds all values at their mean other than the primary independent variable post1990. We then transform the data frames into data matricies such that we can use matrix multiplication.

3.5.4 Simulations: Step 4

Lastly, to derive our predicted probabilities, we need to calculate them following our “by-hand” calculations above minus the link function.

We repeat the process for post1990.

Finally, we can calculate the first difference to determine if there is a statistically significant difference between values of our treatment variable.

Finally, we can plot our results and review \(H_{1}\) again.

3.5.5 Simulations: Plot H(1) Results

3.5.6 Simulations: Does H(1) Still Hold?

In this article, we have improved and extended the author’s predictions by incorporating credible intervals to their predicted probabilities which were absent in their flagship political science journal article. To see if the author’s hypothesis still holds, lets revisit the hypothesis again:

\(H_{1}\): In a comparison of civil wars, those that started during the Cold War will be more likely to use irregular warfare than those that started after the Cold War.

Judging from the graph above, we still find broad support for the author’s hypothesis which is evident from the large positive predicted probability whose confidence interval is far from zero and whose first difference does not include zero in its confidence intervals.

4 Assumption Check

When we use a multinomial logit, we use a model that assumes Independence from Irrelevant Alternatives (IIA). This means that the probability of selecting one choice (i.e., one level of the DV) is not affected by the presence or absence of a third alternative. While there is a statistical test, hmftest, available to test for IIA in the mlogit package, we can test for IIA by removing one choice at a time and then view the model’s coefficients to look for considerable changes. If there are considerable changes, IIA is likely violated.

We can run this test rather quickly. First, we will drop SNC because there are only a few cases of these types of wars. We can see below that the coefficient for our primary independent variable of interest barely changes, suggesting our model has no violation of IIA.

dd <- df %>% filter(technologyrebellion != 'SNC')
logit <- glm(technologyrebellion ~ roughterrain + ethnicwar + gdpcapita_fl + post1990, family=binomial(link="logit"), data=dd)
summary(logit)
## 
## Call:
## glm(formula = technologyrebellion ~ roughterrain + ethnicwar + 
##     gdpcapita_fl + post1990, family = binomial(link = "logit"), 
##     data = dd)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.5213  -0.8543  -0.8206   0.9727   1.6128  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)   
## (Intercept)  -1.113902   0.788088  -1.413  0.15753   
## roughterrain  0.003581   0.007679   0.466  0.64096   
## ethnicwar     0.091535   0.441953   0.207  0.83592   
## gdpcapita_fl  0.026408   0.146368   0.180  0.85682   
## post1990      1.428683   0.498121   2.868  0.00413 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 159.76  on 119  degrees of freedom
## Residual deviance: 147.10  on 115  degrees of freedom
## AIC: 157.1
## 
## Number of Fisher Scoring iterations: 7
stargazer(logit, type="text")
## 
## =============================================
##                       Dependent variable:    
##                   ---------------------------
##                       technologyrebellion    
## ---------------------------------------------
## roughterrain                 0.004           
##                             (0.008)          
##                                              
## ethnicwar                    0.092           
##                             (0.442)          
##                                              
## gdpcapita_fl                 0.026           
##                             (0.146)          
##                                              
## post1990                   1.429***          
##                             (0.498)          
##                                              
## Constant                    -1.114           
##                             (0.788)          
##                                              
## ---------------------------------------------
## Observations                  120            
## Log Likelihood              -73.549          
## Akaike Inf. Crit.           157.098          
## =============================================
## Note:             *p<0.1; **p<0.05; ***p<0.01

Next, we will drop Conventional. As we can see below, the coefficient for our primary independent variable for SNC is very close to our multinomial logit. Thus, we can be sure that we are not violating IIA.

dd <- df %>% filter(technologyrebellion != 'Conventional')
logit <- glm(technologyrebellion ~ 
               roughterrain + ethnicwar + gdpcapita_fl + post1990, 
             family=binomial(link="logit"), data=dd)

stargazer(logit, type="text")
## 
## =============================================
##                       Dependent variable:    
##                   ---------------------------
##                       technologyrebellion    
## ---------------------------------------------
## roughterrain                -0.028           
##                             (0.018)          
##                                              
## ethnicwar                   -0.117           
##                             (0.669)          
##                                              
## gdpcapita_fl                -0.484           
##                             (0.332)          
##                                              
## post1990                   2.628***          
##                             (0.664)          
##                                              
## Constant                    -1.094           
##                             (1.255)          
##                                              
## ---------------------------------------------
## Observations                  91             
## Log Likelihood              -31.897          
## Akaike Inf. Crit.           73.794           
## =============================================
## Note:             *p<0.1; **p<0.05; ***p<0.01

5 Sources

Kalyvas, Stathis N., and Laia Balcells. “International system and technologies of rebellion: How the end of the Cold War shaped internal conflict.” American Political Science Review 104, no. 3 (2010): 415-429. Dataset: here