Projects

Latent Control: Hidden Markov Models

Published:

Who controls territory in civil war? This is a central variable in the research and analysis of civil wars – yet it is incredibly difficult to measure. In this post, I model territorial control as a latent variable – an unobserved variable that presumes it is the cause of its indicators. This project models the latent variable across the entire country of Afghanistan using sub-national event data, a Hidden Markov Model, Uber’s hexagonal spatial index, and logistic spatial and temporal decay functions to treat serially correlated data in time and space. To view this project, please click here.

PetaFlights

Published:

What accounts for flight delays in the U.S.? This project portrays the machine learning end of a large data engineering project that merged 630 million rows of weather data against 31 million rows of flight data. I use the state-of-the-art in distributed deep learning by leveraging Petastorm, Horovod, and PyTorch to produce a multilayer perceptron model that is distributed across 8 workers in DataBricks. Importantly, I use novel approaches to transform categorical data into continuous features through an embedding table. To view this project, please click here.

NLP: Natural Language Propaganda

Published:

Who are the targets of insurgent propaganda? I investigate the ability to classify the targets (e.g, the U.S. or Kabul) of insurgent propaganda messages using a novel corpus containing over 11,000 Taliban statements from 2014 to 2020. In experiments with Convolutional Neural Network (CNN) and transformer architectures, I demonstrate that the audiences of insurgent messages are best captured by transformers, likely owing to its encoder-decoder architecture. This paper’s contribution is twofold: First, it offers a new and novel data set with utility in classification and summarization tasks for machine learning. Second, it suggests that since the audience of messaging can be reliably identified, new opportunities are afforded to analysts to look closer at the contrasts in language to better understand the targets of information.

Nutritionalcart

Published:

How healthy is the average Instacart user? Are certain types (i.e., vegetarians, carnivores) of food buyers healthier than others? I bring new data to bear on these questions to better understand how healthy the average Instacart user is and to better understand the health benefits afforded to Instacart users who choose some types (i.e., plant-based, meat-based) of foods over others. To determine the relative health of Instacart users, I matched the top 10 most ordered products by aisle with USDA nutrient data by using USDA-provided API access to their database through JavaScript Object Notation (JSON). To view this project, please click here. An upgraded algorithm that better searches the USDA database can be found here.

Map Off

Published:

Map Off is a game designed to test your geography skills in the United States or around the World. The inspiration for this game comes from my wife, Hannah, because we often test our spatial skills against one another in the presence of a map. In turn, we now have access to maps and competition anytime we want.

Reexamining Civilian Preferences in Civil War: A Survey in Afghanistan

Published:

How do civilians react to changing authority in civil war? We investigate this question in Afghanistan using survey data from The Asia Foundation following the end of U.S.-led combat operations in 2014. I demonstrate that there is clear evidence that civilian attitudes are indeed conditional on the following three-way interaction: territorial control, ethnicity, and survival. For instance, there is a notable and statistically significant distinction between Pashtuns and non-Pashtuns under Taliban control in their approval of the Afghan Government. I bring largely unused country-wide individual-level data to bear on analyzing civilian wartime beliefs. To view this research project, please click here.