Introduction
How does the international system shape civil wars? Kalyvas and Balcells test whether or not there is a systemic explanation for the ways in which civil wars are fought, arguing that the Soviet Union was largely sponsoring rural-based insurgencies which employed irregular forms of warfare. By disaggregating civil wars into three types of conflicts (conventional, irregular, symmetric non-conventional (SNC)), the authors find a decline in irregular warfare following the end of the Cold War (Kalyvas and Balcells 2010, 418).
Since the authors used Stata, we will replicate and extend their findings in R, by offering graphical plots, confidence intervals on the estimated probabilities, and robustness checks. We use multinomial logits to primarily to model unordered categorical data.
Data
Dependent Variables
## technologyrebellion post1990 roughterrain ethnicwar
## Conventional:46 Post-1990:38 Min. : 0.00 Ethnic CW :93
## Irregular :74 Pre-1991 :99 1st Qu.: 6.00 Non-Ethnic CW:44
## SNC :17 Median : 12.60
## Mean : 34.76
## 3rd Qu.: 35.80
## Max. :1752.00
## gdpcapita_fl
## Min. :0.050
## 1st Qu.:0.582
## Median :1.136
## Mean :1.555
## 3rd Qu.:2.048
## Max. :6.243
The outcome variable (“Technology of Rebellion”) is an unordered categorical variable which has three values representing different forms of warfare: conventional, irregular, and symmetric non-conventional (SNC). This variable is measured by looking at the type of weaponry used by and the during the first year of conflict.
Conventional warfare emerges when rebels are able to militarily confront states using heavy weaponry such as field artillery and armor. In conventional wars, military confrontation is direct, either across well defined front lines or between armed columns. Conventional civil war takes place when the military technologies of states and rebels are matched at a high level (Kalyvas and Balcells 2010, 419).\
Irregular or guerrilla warfare is a technology of rebellion whereby the rebels privilege small, lightly armed bands operating in rural areas (Fearon and Laitin 2003, 75); it is an expression of relative asymmetry between states and rebels. Rebels have the military capacity to challenge and harass the state, but lack the capacity to confront it in a direct and frontal way. Irregular civil war emerges when the military technologies of the rebels lag vis-a-vis those of the state (Kalyvas and Balcells 2010, 418).\
This is the case when states are unable (or, in a few cases, unwilling) to deploy an organized military against poorly equipped insurgents. Neither incumbents nor insurgents used heavy weaponry. Often mistakenly described as guerrilla wars, SNC wars tend to arise in contexts characterized by extremely weak or collapsed states. We believe that the two categories of conventional and SNC war capture a real and important difference. SNC war is observed when the military technologies of states and rebels are matched at a low level (Kalyvas and Balcells 2010, 419). \
Primary Independent Variable
The primary independent variable, post1990
, is a dummy variable which takes on two values, Pre-1991
and Post-1990
. The authors identify the period of the Cold War as 1944-1990 and post-Cold War from 1991-2004 (the end of their dataset). The authors state that they established 1991 as the cutoff year because it corresponds to the dissolution of the Soviet Union and the emergence of several new states (Kalyvas and Balcells 2010, 423). This makes substantive sense as the official collapse of the Soviet Union occurred on December 26, 1991.\
Analysis
Hypothesis
Hypotheses are central to empirical work and unfortunately the authors do not present a hypothesis. In their absence, I have specified a reasonable one for them so that our analysis may be grounded in expectations.
\(H_{1}\): In a comparison of civil wars, those that started during the Cold War will be more likely to use irregular warfare than those that started after the Cold War.
The Model
This model is the author’s primary workhorse model and is used to derive their most important findings. They do include five other models which test different variables such as post-communist regimes (dummy), marxist insurgents (dummy), and (log) military personnel (continuous).
\[ \log(\frac{\pi_{i(J-1)}}{1-\pi_{iJ}}) = \beta_{0} + \beta_{1}\text{Post 1990}_{i} + \beta_{2}\text{Rough Terrain}_{i} + \beta_{3}\text{Ethnic War}_{i} + \beta_{4}\text{GDP Capita}_{i} \]
Before running their model, we have to relevel our dependent variable such that irregular warfare is our base factor for comparison to match the author’s work. We also make a few other alterations to the variable types to replicate the author’s results.
Data Cleanup
Following the mathematical representation above, we fit the model below. Our results are identical to those found by the authors in their Stata analysis.
Fit the Model
## # weights: 18 (10 variable)
## initial value 150.509884
## iter 10 value 114.191405
## iter 20 value 114.109066
## final value 114.109064
## converged
##
## ==============================================
## Dependent variable:
## ----------------------------
## Conventional SNC
## (1) (2)
## ----------------------------------------------
## roughterrain 0.004 -0.025
## (0.008) (0.017)
##
## ethnicwar 0.172 -0.245
## (0.437) (0.634)
##
## gdpcapita_fl 0.039 -0.468
## (0.147) (0.285)
##
## post1990 1.422*** 2.756***
## (0.490) (0.660)
##
## Constant -1.283 -0.973
## (0.788) (1.137)
##
## ----------------------------------------------
## Akaike Inf. Crit. 248.218 248.218
## ==============================================
## Note: *p<0.1; **p<0.05; ***p<0.01
Now that we have fit our model, what can we do with it?
Interpret the Coefficient’s Sign
Civil wars fought after 1990 increases the likelihood of conventional and symmetric non-conventional (SNC) warfare as compared to irregular warfare (the baseline).
Interpret the Intercept
A non-zero intercept indicates that the insurgents have some inherent propensity to use one form of warfare over another for reasons that are not captured in the model.
Interpret the Significance of the Coefficient
A statistically significant coefficient tells us that our data is extraordinary or the assumption that \(\hat{\beta}\) is 0 is wrong.
Likelihood Ratio Test
We can run a likelihood ratio test (LRT) to compare our model with another nested model. We can see from this LRT, the null model has a higher log-likelihood (i.e., null model fits our data worse) as compared to the fully-specified model. The difference between the two models is statistically significant which means that we can reject the null which says that these two models fit our data the same. We could swap our different variables to try different nested combinations if we wanted to.
## # weights: 6 (2 variable)
## initial value 150.509884
## final value 131.254439
## converged
## Likelihood ratio test
##
## Model 1: technologyrebellion ~ roughterrain + ethnicwar + gdpcapita_fl +
## post1990
## Model 2: technologyrebellion ~ 1
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 10 -114.11
## 2 2 -131.25 -8 34.291 3.598e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Predicted Probabilities
What we really want from our model is predicted probabilities. We generate predicted probabilities by sequencing our primary independent variable while holding all other factors constant at their mean. This is a three step process.
Predicted Probabilities: Step 1
Sequence the independent variable along all of its possible values.
Predicted Probabilities: Step 2
Prepare hypothetical data to generate predictions. While we could create some out-of-sample data just as easily, we are going to hold all of our variables at the mean while allowing our primary independent variable to vary. It is important the model’s variable order is followed (i.e., post1990
is the last variable in mlogit1
, thus it is last here).
Predicted Probabilities: Step 3
Lastly, we interpret the results. Returning to our hypothesis, we stated that: In a comparison of civil wars, those that started during the Cold War will be more likely to use irregular warfare than those that started after the Cold War.
We find strong support for our hypothesis. If a civil war started during the Cold War, it had a 0.66% probability of being fought with irregular warfare, on average, after controlling for rough terrain, ethnic war, and GDP per capita. The first difference between the two time periods is roughly -0.39% (0.27 - 0.66). One limitation with this basic form of predicted probabilities is it does not yield confidence intervals for our estimates. We will fix this problem later.
## sqpost1990 Irregular Conventional SNC
## 1 0 0.6645157 0.3017240 0.03376038
## 2 1 0.2715208 0.5112961 0.21718315
We can graph the results for \(H_{1}\) like so:
x_data = c(predict1[1,1], predict1[2,1], predict1[2,1] - predict1[1,1]) # pull probabilitites
y_data = c("P(Irregular War|Pre-1991)", "P(Irregular War|Post-1990)",
"P(Irregular War|Post-1990)-P(Irregular War|Pre-1991)")
x <- list(title = "Probability", dtick = 0.10)
y <- list(title = "")
plot_ly(type = 'scatter', mode = 'markers') %>%
add_trace(x = ~x_data, y = ~y_data, name = '') %>% layout(title = "H(1): Predicted Probabilities",
showlegend = FALSE,
xaxis = x,
yaxis = y,
margin = list(l = 350))
Predicted Probabilities: By-Hand
We can also calculate the predicted probabilities by-hand relatively easily. The point of this exercise is to show that the mathematics is not all that complicated.
\(\pi_{i1} = \frac{{1}}{{1+exp(-1.283 + 0.004*34.76 + 0.172*1.67 + 0.039*1.55 + 1.422*0) + exp(-0.973 + -0.025*34.76 + -0.245*1.67 + -0.468*1.55 + 2.756*0)}}\)
What we are doing below is we are multiplying our model coefficients by their sample means while specifying pre1991
for our primary independent variable, represented by a 0
. This is because we want to derive the predicted probability for technologyrebellion
during the Cold War (pre1991 == 0
).
## [1] 0.6657894
Simulations - Generate Confidence Intervals
To generate confidence intervals for predicted probabilities, we need to use other packages which involve simulations. We will go through the process in steps.
Simulations: Step 1
Our first step involves taking 50000 draws from a normal distribution which is feasible given that asymptotic normality is a maximum likelihood property. Simulation allows us to account for uncertainty when making predictions.
If we run the summary
command on our simulation results, we can see that the mean for each coefficient is very close to the estimate found in our model.
## (Intercept) roughterrain ethnicwar
## Min. :-4.4160 Min. :-0.031501 Min. :-1.6272
## 1st Qu.:-1.8159 1st Qu.:-0.001365 1st Qu.:-0.1234
## Median :-1.2814 Median : 0.004121 Median : 0.1749
## Mean :-1.2821 Mean : 0.004087 Mean : 0.1726
## 3rd Qu.:-0.7503 3rd Qu.: 0.009507 3rd Qu.: 0.4675
## Max. : 1.8138 Max. : 0.038225 Max. : 2.2536
## gdpcapita_fl post1990 (Intercept) roughterrain
## Min. :-0.54848 Min. :-0.5472 Min. :-5.3825 Min. :-0.10672
## 1st Qu.:-0.05997 1st Qu.: 1.0925 1st Qu.:-1.7390 1st Qu.:-0.03665
## Median : 0.03896 Median : 1.4172 Median :-0.9682 Median :-0.02499
## Mean : 0.03899 Mean : 1.4215 Mean :-0.9695 Mean :-0.02499
## 3rd Qu.: 0.13824 3rd Qu.: 1.7512 3rd Qu.:-0.2000 3rd Qu.:-0.01337
## Max. : 0.68102 Max. : 3.4161 Max. : 3.6402 Max. : 0.05041
## ethnicwar gdpcapita_fl post1990
## Min. :-2.7944 Min. :-1.6048 Min. :0.2053
## 1st Qu.:-0.6704 1st Qu.:-0.6613 1st Qu.:2.3074
## Median :-0.2415 Median :-0.4687 Median :2.7546
## Mean :-0.2441 Mean :-0.4680 Mean :2.7531
## 3rd Qu.: 0.1824 3rd Qu.:-0.2752 3rd Qu.:3.1978
## Max. : 2.3732 Max. : 0.7184 Max. :5.3810
Simulations: Step 2
Next we want to divide our simulated data based on our dependent variable categories. So we slice our sim.coefs
object such that the coefficients representing the conventional warfare
are separated from those representing SNC warfare
.
## (Intercept) roughterrain ethnicwar gdpcapita_fl post1990
## [1,] -0.5072747 0.0020486750 -0.1728692 -0.05297848 1.6632390
## [2,] -1.6162848 0.0139135066 0.3023828 0.04815281 1.3052129
## [3,] -0.3307171 -0.0074206379 -0.4024245 0.06016678 1.7130657
## [4,] -1.1659948 0.0009956614 0.1629752 -0.07448738 1.9347810
## [5,] -1.7503020 0.0089141194 0.4255295 0.32067090 0.5678801
## [6,] -1.8468763 0.0104484757 0.2999280 0.15324356 1.6133587
## (Intercept) roughterrain ethnicwar gdpcapita_fl post1990
## [1,] -0.4107412 -0.032185375 -0.2986586 -0.5676961 2.157556
## [2,] -0.9547440 -0.046474332 0.1545530 -0.5919985 2.435404
## [3,] -0.4150132 -0.012417593 -0.9685103 -0.2842953 3.185353
## [4,] -3.0938449 -0.001394229 0.4788915 -0.2610837 3.360624
## [5,] -1.3545490 0.006198947 -0.4718130 -0.3335254 3.189834
## [6,] 0.2076437 -0.021563892 -0.7154925 -0.2995974 1.706272
Then, we want to specify two data frames that specifies our model which holds all values at their mean other than the primary independent variable post1990
. We then transform the data frames into data matricies such that we can use matrix multiplication.
Simulations: Step 3
Third, we want to initiate a loop and matrix multiply our sample’s average values instantiated in pre1991
with the simulated coefficients representing conventional warfare
. We then wrap this matrix multiplication in our link function \(\exp\). Remember that our base category throughout this process is irregular warfare
. Before running the calculations however, we need to build some containers to store our results.
Simulations: Step 4
Lastly, to derive our predicted probabilities, we need to calculate them following our “by-hand” calculations above minus the link function.
We repeat the process for post1990
.
Finally, we can calculate the first difference to determine if there is a statistically significant difference between values of our treatment variable.
Finally, we can plot our results and review \(H_{1}\) again.
Simulations: Plot H(1) Results
x_data = c(ir.mean.pre1991_b, ir.mean.post1990_b, ir.mean.post1990_b - ir.mean.pre1991_b)
y_data = c("P(Irregular War|Pre-1991)", "P(Irregular War|Post-1990)",
"P(Irregular War|Post-1990) - P(Irregular War|Pre-1991)")
ci_data = c(ir.ci.hi.pre1991_b - ir.ci.low.pre1991_b,
ir.ci.hi.post1990_b - ir.ci.low.post1990_b,
ci.hi.ir.fd - ci.low.ir.fd)
x <- list(title = "Probability", dtick = 0.10)
y <- list(title = "")
plot_ly(type = 'scatter', mode = 'markers') %>%
add_trace(x = ~x_data, y = ~y_data, name = '',
error_x = list(
type = 'data',
array = ci_data)
) %>% layout(title = "H(1): Simulated Predicted Probabilities",
showlegend = FALSE,
xaxis = x,
yaxis = y,
margin = list(l = 350))
Simulations: Does H(1) Still Hold?
In this article, we have improved and extended the author’s predictions by incorporating credible intervals to their predicted probabilities which were absent in their flagship political science journal article. To see if the author’s hypothesis still holds, lets revisit the hypothesis again:
\(H_{1}\): In a comparison of civil wars, those that started during the Cold War will be more likely to use irregular warfare than those that started after the Cold War.
Judging from the graph above, we still find broad support for the author’s hypothesis which is evident from the large positive predicted probability whose confidence interval is far from zero and whose first difference does not include zero in its confidence intervals.
Assumption Check
When we use a multinomial logit, we use a model that assumes Independence from Irrelevant Alternatives (IIA). This means that the probability of selecting one choice (i.e., one level of the DV) is not affected by the presence or absence of a third alternative. While there is a statistical test, hmftest
, available to test for IIA in the mlogit
package, we can test for IIA by removing one choice at a time and then view the model’s coefficients to look for considerable changes. If there are considerable changes, IIA is likely violated.
We can run this test rather quickly. First, we will drop SNC
because there are only a few cases of these types of wars. We can see below that the coefficient for our primary independent variable of interest barely changes, suggesting our model has no violation of IIA.
##
## Call:
## glm(formula = technologyrebellion ~ roughterrain + ethnicwar +
## gdpcapita_fl + post1990, family = binomial(link = "logit"),
## data = dd)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.5213 -0.8543 -0.8206 0.9727 1.6128
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.113902 0.788088 -1.413 0.15753
## roughterrain 0.003581 0.007679 0.466 0.64096
## ethnicwar 0.091535 0.441953 0.207 0.83592
## gdpcapita_fl 0.026408 0.146368 0.180 0.85682
## post1990 1.428683 0.498121 2.868 0.00413 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 159.76 on 119 degrees of freedom
## Residual deviance: 147.10 on 115 degrees of freedom
## AIC: 157.1
##
## Number of Fisher Scoring iterations: 7
##
## =============================================
## Dependent variable:
## ---------------------------
## technologyrebellion
## ---------------------------------------------
## roughterrain 0.004
## (0.008)
##
## ethnicwar 0.092
## (0.442)
##
## gdpcapita_fl 0.026
## (0.146)
##
## post1990 1.429***
## (0.498)
##
## Constant -1.114
## (0.788)
##
## ---------------------------------------------
## Observations 120
## Log Likelihood -73.549
## Akaike Inf. Crit. 157.098
## =============================================
## Note: *p<0.1; **p<0.05; ***p<0.01
Next, we will drop Conventional
. As we can see below, the coefficient for our primary independent variable for SNC
is very close to our multinomial logit. Thus, we can be sure that we are not violating IIA.
##
## =============================================
## Dependent variable:
## ---------------------------
## technologyrebellion
## ---------------------------------------------
## roughterrain -0.028
## (0.018)
##
## ethnicwar -0.117
## (0.669)
##
## gdpcapita_fl -0.484
## (0.332)
##
## post1990 2.628***
## (0.664)
##
## Constant -1.094
## (1.255)
##
## ---------------------------------------------
## Observations 91
## Log Likelihood -31.897
## Akaike Inf. Crit. 73.794
## =============================================
## Note: *p<0.1; **p<0.05; ***p<0.01
Sources
Kalyvas, Stathis N., and Laia Balcells. “International system and technologies of rebellion: How the end of the Cold War shaped internal conflict.” American Political Science Review 104, no. 3 (2010): 415-429. Dataset: here