Nutritionalcart: Adding Nutritional Information to Instacart Groceries

Andrew Fogarty


## python:         C:\Users\Andrew\ANACON~1\python.exe
## libpython:      C:/Users/Andrew/ANACON~1/python37.dll
## pythonhome:     C:\Users\Andrew\ANACON~1
## version:        3.7.3 (default, Mar 27 2019, 17:13:21) [MSC v.1915 64 bit (AMD64)]
## Architecture:   64bit
## numpy:          C:\Users\Andrew\ANACON~1\lib\site-packages\numpy
## numpy_version:  1.16.2
## python versions found: 
##  C:\Users\Andrew\ANACON~1\python.exe
##  C:\Users\Andrew\Anaconda3\python.exe
# load python packages
import requests
import json
import numpy as np
import pandas as pd
import plotly
import plotly.graph_objs as go

1 Introduction

How healthy is the average Instacart user? Are certain types (i.e., vegetarians, carnivores) of food buyers healthier than others? I bring new data to bear on these questions to better understand how healthy the average Instacart user is and to better understand the health benefits afforded to Instacart users who choose some types (i.e., plant-based, meat-based) of foods over others. I begin this section by describing the data generation and measurement process, next I describe the data set and Instacart users in terms of their health, and I conclude by evaluating specific hypotheses associated with my research question. To determine the relative health of Instacart users, I matched the top 10 most ordered products by aisle with USDA nutrient data by using USDA-provided API access to their database through JavaScript Object Notation (JSON). I chose to use the top 10 most ordered products by aisle because, on average, this set of products accounted for over 30% of the items ordered from each aisle. Because the top 10 products collectively account for an overwhelming plurality of the products ordered by aisle, I assume that these products, and their nutrients, are also generally representative of the nutrients found in the rest of the products by aisle. Consequently, my aisle-level nutrient data is derived from the mean of the nutrients found in the top 10 products.

2 Measurement

To generate and measure the health variable, I relied on an algorithm found in an academic journal.1 Given 82 different pieces of nutrient data for 1010 items, I used the algorithm to summarize the nutrients into a single statistic called the Weighted Nutrient Density Score (WNDS). The WNDS is a continuous variable, containing positive and negative values, whereby positive values connote greater nutrient qualities and thus health. Researchers derived this algorithm through statistical analysis of each nutrient by determining the extent to which each nutrient explains the most variation of a composite score developed by the USDA’s Healthy Eating Index. After generating the WNDS for each item, I gave each aisle its own WNDS by taking the mean of each set of top 10 commonly ordered products by aisle. Consequently, I were then able to generate user-WNDS by taking the average of the aisle-WNDS for the items they ordered.

3 Descriptive Inference

The aisle- and user-WNDS are instructive because they readily help us think about the nutrient quality of the food sold to Instacart users as well as what health effects it might have. The graph below shows the average WNDS for 25 randomly selected Instacart aisles which necessarily excludes 35 of the 134 aisles because they include non-food products such as pet food, hygiene, and medicine. This graph shows that most aisles contain low-to-moderately healthy foods as evidenced by the majority of values being positive. The predominance of healthy items means that I should expect to observe the average Instacart user to have positive WNDS.