## python: C:\Users\Andrew\ANACON~1\python.exe
## libpython: C:/Users/Andrew/ANACON~1/python37.dll
## pythonhome: C:\Users\Andrew\ANACON~1
## version: 3.7.3 (default, Mar 27 2019, 17:13:21) [MSC v.1915 64 bit (AMD64)]
## Architecture: 64bit
## numpy: C:\Users\Andrew\ANACON~1\lib\site-packages\numpy
## numpy_version: 1.16.2
##
## python versions found:
## C:\Users\Andrew\ANACON~1\python.exe
## C:\Users\Andrew\Anaconda3\python.exe
# load python packages
import requests
import json
import numpy as np
import pandas as pd
import plotly
import plotly.graph_objs as go
How healthy is the average Instacart user? Are certain types (i.e., vegetarians, carnivores) of food buyers healthier than others? I bring new data to bear on these questions to better understand how healthy the average Instacart user is and to better understand the health benefits afforded to Instacart users who choose some types (i.e., plant-based, meat-based) of foods over others. I begin this section by describing the data generation and measurement process, next I describe the data set and Instacart users in terms of their health, and I conclude by evaluating specific hypotheses associated with my research question. To determine the relative health of Instacart users, I matched the top 10 most ordered products by aisle with USDA nutrient data by using USDA-provided API access to their database through JavaScript Object Notation (JSON). I chose to use the top 10 most ordered products by aisle because, on average, this set of products accounted for over 30% of the items ordered from each aisle. Because the top 10 products collectively account for an overwhelming plurality of the products ordered by aisle, I assume that these products, and their nutrients, are also generally representative of the nutrients found in the rest of the products by aisle. Consequently, my aisle-level nutrient data is derived from the mean of the nutrients found in the top 10 products.
To generate and measure the health variable, I relied on an algorithm found in an academic journal.1 Given 82 different pieces of nutrient data for 1010 items, I used the algorithm to summarize the nutrients into a single statistic called the Weighted Nutrient Density Score (WNDS). The WNDS is a continuous variable, containing positive and negative values, whereby positive values connote greater nutrient qualities and thus health. Researchers derived this algorithm through statistical analysis of each nutrient by determining the extent to which each nutrient explains the most variation of a composite score developed by the USDA’s Healthy Eating Index. After generating the WNDS for each item, I gave each aisle its own WNDS by taking the mean of each set of top 10 commonly ordered products by aisle. Consequently, I were then able to generate user-WNDS by taking the average of the aisle-WNDS for the items they ordered.
The aisle- and user-WNDS are instructive because they readily help us think about the nutrient quality of the food sold to Instacart users as well as what health effects it might have. The graph below shows the average WNDS for 25 randomly selected Instacart aisles which necessarily excludes 35 of the 134 aisles because they include non-food products such as pet food, hygiene, and medicine. This graph shows that most aisles contain low-to-moderately healthy foods as evidenced by the majority of values being positive. The predominance of healthy items means that I should expect to observe the average Instacart user to have positive WNDS.
Indeed, the graph of each Instacart user by their WNDS shows that this is the case: the average Instacart user has a positive WNDS. Despite the graph containing over 200,000 data points, it clearly shows the distribution of most users maintaining a positive WNDS.
Instacart users were classified as the following types: vegetarian, flexitarian, and carnivore. Vegetarians are those who purchased only plant-based products, flexitarians purchased a mix of plant- and meat-based products, while carnivores exclusively purchased meat-based products. This returned the following number of types of people from which I draw my WNDS and health inferences from, shown in the table below:
User Types | Values |
---|---|
Vegetarian | 12812 |
Flexitarian | 26522 |
Carnivore | 107134 |
In this section, I formally specify and test a hypothesis using my novel health data in conjunction with the Instacart data. My hypothesis is: In a comparison of Instacart users, plant-based consumers (i.e., flexitarians and vegetarians) are more likely to have a higher WNDS as compared to meat-based consumes (i.e., carnivores).
Put less formally, the hypothesis states that as Instacart users purchase more plant-based products, their WNDS should be higher which means that they should be healthier than if they bought less plant-based products. The null hypothesis, then, states that there is no difference in the WNDS between types of consumers (i.e., Flexitarians). If the null is true, then I should expect to see each type of consumer having roughly the same WNDS. If the null is not true, then each type of consumer should have a different WNDS. Further, the results should be nuanced such that the WNDS increases as more plant-based products are consumed. Put differently, this means that I should see the WNDS results ordered such that the WNDS for vegetarians is strictly greater than the WNDS for flexitarians which is strictly greater than the WNDS for carnivores.
The graph below clearly shows that I can reject the null that there is no difference between consumer types because of the ordered difference found between the types of consumers. I find the starkest difference between vegetarians and carnivores. Specifically, the graph shows that vegetarians, on average, have a five-point WNDS advantage, as compared to carnivores.
I probed whether or not certain types of Instacart shoppers (i.e., flexitarians) were healthier than other types of shoppers. I found clear support for my hypothesis which states that as Instacart users buy more plant-based products, their WNDS is more likely to increase. Because plant-based products contain more nutrients, I judge that Instacart users who purchase more plant-based products are more likely to be healthier than Instacart users who do not.
# Build Data
def prepare_data():
''' this function loads and merges the instacart data
that is split across a number of different csv files.
The codebook can found here:
https://gist.github.com/jeremystan/c3b39d947d9b88b3ccff3147dbcf6c6b '''
# load necessary CSVs
order_products_train = pd.read_csv('order_products__train.csv') # contains order/product info
order_products_prior = pd.read_csv('order_products__prior.csv') # contains order/product info
orders = pd.read_csv('orders.csv') # contains variables about orders
products = pd.read_csv('products.csv') # contains variables about range of products
departments = pd.read_csv('departments.csv') # categorizes products by department
aisles = pd.read_csv('aisles.csv') # categorizes products by aisle
# combine orders_product_train/prior into single dataframe
order_products = pd.concat([order_products_train, order_products_prior])
# merge in all other datasets
df = orders.merge(order_products, on = ['order_id'])
df = df.merge(products, on = ['product_id'])
df = df.merge(departments, on = ['department_id'])
df = df.merge(aisles, on = ['aisle_id'])
return df # return df for analysis
df = prepare_data() # instantiate df; will consume ~4gb of RAM
# Find most ordered products, by aisle; filter out non-food products
# List aisle_ids:
# 6 - other; 10 - kitchen supplies; 11 - cold flu allergy; 20 - oral hygiene
# 22 - hair care; 25 - soap; 40 - dog food care; 41 - cat food care;
# 44 - eye ear care; 46 - mint gum; 54 - paper goods; 55 - shave needs
# 56 - diapers wipes; 60 - trash bag liners; 70 - digestion; 73 - facial care
# 74 - dish detergents, 75 - laundry
# 80 - deodorants; 82 - baby accessories; 85 - food storage; 87 - more households
# 92 - baby food formula; 97 - baking supplies decor; 100 - missing;
# 101 - air fresheners candles; 102 - baby bath body care; 109 - skin care;
# 111 - plates bowls cups flatware; 114 - cleaning products; 118 - first aid
# 126 - feminine care; 127 - body lotions soap; 132 - beauty; 133 - muscles joint pain relief
# Create a aisle_id filter list
filter_list = [6, 10, 11, 20, 22, 25, 40, 41, 44, 46, 54, 55, 56, 60, 70, 73,
74, 75, 80, 82, 85, 87, 92, 97, 100, 101, 102, 109, 111, 114, 118, 126,
127, 132, 133]
# Filter the df - return rows that are not in the filter list
filtered_df = df.loc[~df['aisle_id'].apply(lambda x: x in filter_list)]
# Get the top 10 products to collect nutrient statistics on
top_products_by_aisle = filtered_df.groupby(['aisle_id', 'aisle'])['product_name'].value_counts()
top_products_by_aisle = top_products_by_aisle.reset_index(name="count")
# Get the top 10 by using head
top_products_by_aisle = top_products_by_aisle.groupby('aisle_id').head(10)
# Search each entry in top_products_by_aisle by USDA database through API
def fdcID_retrieval():
''' This function finds the FDCID for each of the top 10 most
commonly ordered items by aisle. It does so by capturing the first record
from the USDA database, FoodData Central (https://fdc.nal.usda.gov/).
The first record is the the most likely object based on USDA's search
algorithm. This function could be improved by making another function
which is executed prior to this function that that pulls the top
10 or so results and then does fuzzy matching to find the most likely
product item given top_products_by_aisle. This would help us be more certain
that the product that I am collecting nutritional data is a lot like the
one found in the Instacart aisles.
I am retrieving the FDCID because it is necessary to move to the next
step which is: pull nutrition data. '''
# Set API details
requested_url = 'https://api.nal.usda.gov/fdc/v1/search'
api_key = '?api_key=dvCyz1caFZ12A2Q04pm7ZQ9b9Z8h4pcK7dl4GI8K'
headers = {'Content-Type': 'application/json'}
# Put top products in a list
top_products_by_aisle_list = top_products_by_aisle['product_name'].tolist()
fdcID_container = [] # container for results
for item in top_products_by_aisle_list:
data = {"generalSearchInput": item} # pull item in list
data_str = json.dumps(data).encode("utf-8") # convert to json format
response = requests.post(requested_url + api_key,
headers=headers, data=data_str) # commit and API request for the item
parsed = json.loads(response.content) # generate JSON data
try: # using try for case where one item which has no results
temp_fdcID = parsed['foods'][0]['fdcId'] # pull first search result
fdcID_container.append(temp_fdcID) # append the FDCID for use later
except: # in case of error/no result, append np.nan
fdcID_container.append(np.nan)
return fdcID_container
fdcID_list = fdcID_retrieval()
# Using the FDCID data pulled above, conduct another API request to get nutrition
# Nutrient list found on U.S. food product labels
# 1257 - trans fat
# 1293 - trans fat - poly
# 1292 - trans fat - mono
# 1258 - sat. fat
# 1253 - cholesterol
# 1093 - sodium
# 1005 - carbohydrates
# 1079 - fiber
# 2000 - sugars
# 1003 - protein
# 1104 - vit a
# 1162 - vit c
# 1087 - calcium
# 1089 - iron
# 1008 - energy
# Search each FDCID in the USDA database through API for nutritional data
def nutrition_retrieval():
''' This function collects the most important nutritional data for each
of the top 10 most commonly ordered products. It does so by making calls
to the USDA database, FoodData Central (https://fdc.nal.usda.gov/), and it
then retrieves the returned JSON data for the relevant nutritional data. '''
# Set container storage and ordering
nutrient_container = []
nutrient_list = ['trans_fat', 'sat_fat', 'cholesterol', 'sodium',
'carbs', 'fiber','sugars', 'protein', 'vit_a', 'vit_c', 'calcium',
'iron', 'fdcID']
nutrient_container.append(nutrient_list)
# Set API details
USDA_URL = 'https://api.nal.usda.gov/fdc/v1/'
API_KEY = 'api_key=dvCyz1caFZ12A2Q04pm7ZQ9b9Z8h4pcK7dl4GI8K'
headers = {'Content-Type': 'application/json'}
# Loop over each FDCID; commit a API request for each
for i in range(0, len(fdcID_list)):
fdcId = str(fdcID_list[i])
requested_url = USDA_URL + fdcId + '?' + API_KEY
response = requests.get(requested_url, headers=headers)
parsed = json.loads(response.content)
trans_fat = 0
trans_fat_poly = 0
trans_fat_mono = 0
sat_fat = 0
cholesterol = 0
sodium = 0
carbs = 0
fiber = 0
sugars = 0
protein = 0
vit_a = 0
vit_c = 0
calcium = 0
iron = 0
energy = 0
fdc_id = fdcID_list[i]
# Loop over dictionary length to look for desired data
for j in range(0, 120):
try:
if parsed['foodNutrients'][j]['nutrient']['id'] == 1257:
trans_fat = parsed['foodNutrients'][j]['amount']
if parsed['foodNutrients'][j]['nutrient']['id'] == 1293:
trans_fat_poly = parsed['foodNutrients'][j]['amount']
if parsed['foodNutrients'][j]['nutrient']['id'] == 1292:
trans_fat_mono = parsed['foodNutrients'][j]['amount']
if parsed['foodNutrients'][j]['nutrient']['id'] == 1258:
sat_fat = parsed['foodNutrients'][j]['amount']
if parsed['foodNutrients'][j]['nutrient']['id'] == 1253:
cholesterol = parsed['foodNutrients'][j]['amount']
if parsed['foodNutrients'][j]['nutrient']['id'] == 1093:
sodium = parsed['foodNutrients'][j]['amount']
if parsed['foodNutrients'][j]['nutrient']['id'] == 1005:
carbs = parsed['foodNutrients'][j]['amount']
if parsed['foodNutrients'][j]['nutrient']['id'] == 1079:
fiber = parsed['foodNutrients'][j]['amount']
if parsed['foodNutrients'][j]['nutrient']['id'] == 2000:
sugars = parsed['foodNutrients'][j]['amount']
if parsed['foodNutrients'][j]['nutrient']['id'] == 1003:
protein = parsed['foodNutrients'][j]['amount']
if parsed['foodNutrients'][j]['nutrient']['id'] == 1104:
vit_a = parsed['foodNutrients'][j]['amount']
if parsed['foodNutrients'][j]['nutrient']['id'] == 1162:
vit_c = parsed['foodNutrients'][j]['amount']
if parsed['foodNutrients'][j]['nutrient']['id'] == 1087:
calcium = parsed['foodNutrients'][j]['amount']
if parsed['foodNutrients'][j]['nutrient']['id'] == 1089:
iron = parsed['foodNutrients'][j]['amount']
if parsed['foodNutrients'][j]['nutrient']['id'] == 1008:
energy = parsed['foodNutrients'][j]['amount']
except: # In case of error; continue anyways
pass
trans_fat = trans_fat + trans_fat_poly + trans_fat_mono
nutrient_container.append([trans_fat, sat_fat, cholesterol,
sodium, carbs, fiber, sugars, protein, vit_a, vit_c,
calcium, iron, energy, fdc_id])
return nutrient_container
nutrient_list = nutrition_retrieval()
# Turn nutrient_list into df for preprocessing
nutrient_df = pd.DataFrame(nutrient_list[1::], columns = ['trans_fat', 'sat_fat', 'cholesterol',
'sodium', 'carbs', 'fiber', 'sugars',
'protein', 'vit_a', 'vit_c', 'calcium',
'iron', 'energy', 'fdcID']) # sliced to remove column titles
# Preprocess the nutrient data
def nutrient_preprocessing(dataframe):
''' This function preprocesses the nutrient data by converting each
nutrient that I will use in analyze to a common base of 1 kcal. This
common base is helpful for my next step which is to create a
weighted nutrient densite score (WNDS). Since the WNDS algorithm bases its
calculations on 100kcal, this makes multiplication
more convienient and easier to think about. '''
# Convert nutrients to base 1 energy
dataframe['protein'] = dataframe['protein'] / dataframe['energy']
dataframe['fiber'] = dataframe['fiber'] / dataframe['energy']
dataframe['trans_fat'] = dataframe['trans_fat'] / dataframe['energy']
dataframe['sat_fat'] = dataframe['sat_fat'] / dataframe['energy']
dataframe['sugars'] = dataframe['sugars'] / dataframe['energy']
dataframe['calcium'] = dataframe['calcium'] / dataframe['energy']
dataframe['vit_c'] = dataframe['vit_c'] / dataframe['energy']
dataframe['sodium'] = dataframe['sodium'] / dataframe['energy']
return dataframe
nutrient_df = nutrient_preprocessing(nutrient_df)
def weighted_nutrient_density_score(dataframe):
''' This function calculates the WNDS which is based on an algorithm
devised by Joanne, Fulgoni, Hersey and Muth. This algorithm is based on
a statistical analysis of the USDA Healthy Eating Index to determine
which nutrients explain the most variation in the components scores.
Those nutrients are: protein, fiber, calcium, trans fat, vitamin c,
saturated fat, sugars, and sodium. '''
# Calculate WNDS based on journal article
wnds_protein = (1.4 * ((dataframe['protein'] * 100) / 50))
wnds_fiber = (3.13 * ((dataframe['fiber'] * 100) / 25))
wnds_calcium = (1 * ((dataframe['calcium'] * 100) / 1000))
wnds_trans_fat = (2.51 * ((dataframe['trans_fat'] * 100) / 44))
wnds_vit_c = (0.37 * ((dataframe['vit_c'] * 100) / 60))
wnds_sat_fat = (2.95 * ((dataframe['sat_fat'] * 100) / 20))
wnds_sugars = (0.52 * ((dataframe['sugars'] * 100) / 50))
wnds_sodium = (1.34 * ((dataframe['sodium'] * 100) / 2400))
wnds = (wnds_protein + wnds_fiber + wnds_calcium + wnds_trans_fat +
wnds_vit_c - wnds_sat_fat - wnds_sugars - wnds_sodium) * 100
return wnds
wnds = weighted_nutrient_density_score(nutrient_df)
# Place the analyzed data into a new dataframe
matching_df = pd.DataFrame(list(zip(wnds, fdcID_list,
top_products_by_aisle['aisle_id'], top_products_by_aisle['product_name'])),
columns = ['wnds_item_score', 'fdcID', 'aisle_id', 'product_name'])
# Apply WNDS score to all foods in the aisle; top 10 most ordered is proxy score for all items in the aisle
matching_df['wnds_aisle_mean'] = matching_df.groupby('aisle_id')['wnds_item_score'].transform('mean')
# Get aisle WNDS score by row
matching_df = matching_df.groupby('aisle_id').agg({'wnds_aisle_mean' : 'max'}).reset_index()
# Join aisle WNDS scores back to main dataframe (filtered DF)
df2 = filtered_df.merge(matching_df, on = ['aisle_id'])
# Build a WNDS score for each user_id by taking the average of their items
df2['userid_wnds_score'] = df2.groupby('user_id')['wnds_aisle_mean'].transform('mean')
# How healthy is the average instacart user?
user_health_aggregate = df2.groupby('user_id')['userid_wnds_score'].agg('max').reset_index()
user_health_aggregate['userid_wnds_score'].mean() # mean equal to median roughly
user_health_aggregate['userid_wnds_score'].median()
# Plot
N = user_health_aggregate['user_id'].shape[0]
trace = go.Scattergl( # scattergl for higher performance
x = user_health_aggregate['user_id'],
y = user_health_aggregate['userid_wnds_score'],
hoverinfo = 'text',
name = '',
hovertemplate =
'User ID: %{x}' +
'<br>WNDS Score: %{y:.2f}',
mode = 'markers',
marker = dict(
color = np.random.randn(N),
colorscale = 'Viridis',
line_width = 0.1))
layout = go.Layout(title="<b>Figure 5: WNDS for each Instacart User </b>",
hovermode = 'closest',
font = dict(size = 18),
xaxis = dict(title_text = 'User ID'),
yaxis = dict(title_text = 'WNDS'))
fig = dict(data = [trace], layout = layout)
plotly.offline.plot(fig) # offline plotting
# How healthy is each aisle?
aisle_mean = df2.groupby('aisle')['wnds_aisle_mean'].agg('max').reset_index()
aisle_mean['wnds_aisle_mean'].median()
aisle_mean['wnds_aisle_mean'].max()
aisle_mean['wnds_aisle_mean'].min()
# filter out NaN from aisle_mean
aisle_mean = aisle_mean.loc[aisle_mean['wnds_aisle_mean'].notnull()]
# Plot
random_subset = aisle_mean.sample(n=25)
N = random_subset['aisle'].shape[0]
random_subset = random_subset.sort_values(by = 'wnds_aisle_mean')
trace = go.Scattergl( # scattergl for higher performance
x = random_subset['aisle'].str.title(), # title case the aisles
y = random_subset['wnds_aisle_mean'],
hoverinfo = 'text',
name = '',
hovertemplate =
'Aisle: %{x}' +
'<br>WNDS Score: %{y:.2f}',
mode = 'markers',
marker = dict(
color = np.random.randn(N),
colorscale = 'Viridis',
line_width = 0.1))
layout = go.Layout(title="<b>Figure 4: WNDS for each Instacart Aisle </b>",
hovermode = 'closest',
font = dict(size = 18),
xaxis = dict(tickangle = 45,
title_text = 'Aisle'),
yaxis = dict(title_text = 'WNDS'),
margin = dict(b=150))
fig = dict(data = [trace], layout = layout)
plotly.offline.plot(fig) # offline plotting
# How healthy are plant-based consumers vs meat-based consumers?
# if running script live -- ignore this load and replace h3 with df2 from above
#h3 = pd.read_csv('C:\\Users\\Andrew\\Desktop\\h3_dataset.csv')
emp = pd.read_csv('flex-emp.csv') # contains order/product info
h3 = df2.merge(emp, on = ['product_id'])
h3 = h3[(h3.aisle_id==96) | (h3.aisle_id==14) | (h3.aisle_id==106) |
(h3.aisle_id==122) | (h3.aisle_id==7) | (h3.aisle_id==49) |
(h3.aisle_id==35) | (h3.aisle_id==34) | (h3.aisle_id==42)] # filter specified foods only
h3 = h3[h3.emp != 'E'] # remove excluded items
def h3_comparison(dataframe):
counts = dataframe.groupby(['user_id', 'emp']).size().reset_index()
counts = counts.rename(columns={0: "item_count"})
# get total counts of M + P
total_counts = counts.groupby('user_id').sum().reset_index()
total_counts = total_counts.rename(columns={'item_count': "total_count"})
# build meat_count var
meat_count = counts[counts.emp == 'M'].groupby('user_id')['item_count'].max().reset_index()
meat_count = meat_count.rename(columns={'item_count': "meat_count"})
# build plant_count var
plant_count = counts[counts.emp == 'P'].groupby('user_id')['item_count'].max().reset_index()
plant_count = plant_count.rename(columns={'item_count': "plant_count"})
# join counts with original df
dataframe = dataframe.merge(total_counts, how = 'outer', on = ['user_id'])
dataframe = dataframe.merge(meat_count, how = 'outer', on = ['user_id'])
dataframe = dataframe.merge(plant_count, how = 'outer', on = ['user_id'])
# fill missing data with 0
dataframe['meat_count'] = dataframe['meat_count'].fillna(0)
dataframe['plant_count'] = dataframe['plant_count'].fillna(0)
# get ratio -- base is meat
ratio = dataframe.groupby('user_id').apply(lambda x: x['meat_count'] / x['total_count']).reset_index()
ratio = ratio.rename(columns={0: "ratio"})
# drop unnecessary column
ratio = ratio.drop(columns=['level_1'])
# add ratio to df
dataframe = dataframe.merge(ratio, on = ['user_id'])
# aggregate
summarized_stats = dataframe.groupby('user_id').agg({'ratio' : 'max',
'userid_wnds_score':'max'}).reset_index()
return summarized_stats
ratios = h3_comparison(h3)
# Create empty column
ratios['categories'] = None
# Find user types:
carnivore = ratios['ratio'] == 1.00 # only buys meat
flexitarian = (ratios['ratio'] > 0) & (ratios['ratio'] < 1.00) # buys a mix
vegetarian = (ratios['ratio'] == 0.00) # only buys plants
# Apply mask
ratios.loc[carnivore, 'categories'] = 'Carnivore'
ratios.loc[flexitarian, 'categories'] = 'Flexitarian'
ratios.loc[vegetarian, 'categories'] = 'Vegetarian'
# Check distribution
ratios['categories'].value_counts()
# Subset data
carnivores = ratios[ratios.categories == 'Carnivore']
flexitarians = ratios[ratios.categories == 'Flexitarian']
vegetarians = ratios[ratios.categories == 'Vegetarian']
# Generate sample means
carnivores_mean = carnivores['userid_wnds_score'].mean()
flexitarians_mean = flexitarians['userid_wnds_score'].mean()
vegetarians_mean = vegetarians['userid_wnds_score'].mean()
# Plant eaters slightly healthier
# Dot plot
y_axis = ['Carnivores', 'Flexitarians', 'Vegetarians', 'Average User']
user_type_mean = [carnivores_mean, flexitarians_mean, vegetarians_mean, 32.29]
df_dotplot = pd.DataFrame({'y' : y_axis, 'x' : user_type_mean})
df_dotplot = df_dotplot.sort_values(by = 'x')
trace = go.Scattergl(
x = df_dotplot['x'],
y = df_dotplot['y'],
mode = 'markers',
marker = dict(
color = 'rgba(156, 165, 196, 0.95)',
line_color = 'rgba(156, 165, 196, 1.0)',
line_width = 1,
symbol = 'circle',
size = 16))
layout = go.Layout(
title = "<b>Figure 6: Average Mean WNDS by Shopper Type (Higher is Better) </b>",
font = dict(size = 18),
yaxis = dict(
title_text = 'Instacart User Type'),
xaxis = dict(
showgrid = False,
showline = True,
title_text = "Weighted Nutrient Density Score",
linecolor = 'rgb(102, 102, 102)',
tickfont_color = 'rgb(102, 102, 102)',
showticklabels = True,
dtick = 0.5,
ticks = 'outside',
tickcolor = 'rgb(102, 102, 102)',),
margin = dict(l=140, r=40, b=50, t=80),
legend = dict(
font_size = 10,
yanchor = 'middle',
xanchor = 'right',
),
width = 1200,
height = 800,
paper_bgcolor = 'white',
plot_bgcolor = 'white',
hovermode = 'closest')
fig = dict(data = [trace], layout = layout)
plotly.offline.plot(fig) # offline plotting
The Instacart Online Grocery Shopping Dataset 2017”, Accessed from https://www.instacart.com/datasets/grocery-shopping-2017 on 13 January 2020
Arsenault, Joanne E., Victor L. Fulgoni III, James C. Hersey, and Mary K. Muth. “A novel approach to selecting and weighting nutrients for nutrient profiling of foods and diets.” Journal of the Academy of Nutrition and Dietetics 112, no. 12 (2012): 1968-1975.
Arsenault, Joanne E., Victor L. Fulgoni III, James C. Hersey, and Mary K. Muth. “A novel approach to selecting and weighting nutrients for nutrient profiling of foods and diets.” Journal of the Academy of Nutrition and Dietetics 112, no. 12 (2012): 1968-1975.↩︎
The algorithm only considers the following nutrients: protein, fiber, trans fat, saturated fat, sugars, calcium, vitamin c, and sodium.↩︎