To Yin Yu(Github: tonyx1998)
Obesity is a common, serious and costly disease in the US. According to the Centers for Disease Control and Prevention, Obesity-related conditions include heart disease, stroke, type 2 diabetes and certain types of cancer. These are among the leading causes of preventable, premature death. Also, obesity adults are at greater risk during the COVID-19 pandemic, since obesity would worsen the outcomes from COVID-19.
However, not every person has the same probability to be obese, obesity affects some groups more than others. Hence, in this tutorial is aimed to analyze the correlation of socioeconomic status, environment and obesity across states. We are going to see whether some aspects are more linked to one being obese.
You can learn more about obesity here:
In this part, we are going to import all the libraries and functionalities that are required for the rest of the tutorial.
1. Pandas(https://pandas.pydata.org/) & Numpy(https://numpy.org/): Required for dataframe manipulation
2. Matplotlib(https://matplotlib.org/) & Seaborn(https://seaborn.pydata.org/): Required for data visualization
3. scikit-learn(https://scikit-learn.org/stable/) & statsmodels(https://www.statsmodels.org/stable/index.html): Required for hypothesis testing and machine learning
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn import tree
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error as mae
from sklearn.preprocessing import StandardScaler
import statsmodels.api as sm
Before we starting diving into finding interesting patterns, and analyzing different correlation between factors, we will first need some data. The data I am using for the tutorial are annual obesity rate, air pollution rate, high school graduration rate, and public health funding by state(https://www.americashealthrankings.org/explore/annual/measure/Obesity/state/ALL), annual average temperature by state(https://www.ncdc.noaa.gov/cag/statewide/time-series/1/tavg/ann/6/1990-2020?base_prd=true&begbaseyear=1901&endbaseyear=2000), annual Gross domestic product(GDP) in current currency(USD), percentage, and per capita by state(https://apps.bea.gov/), and annual median household income by state(https://www.census.gov/data/tables/time-series/demo/income-poverty/historical-income-households.html). We will have our data remain raw and messy in this part, and clean it up in the next part of the tutorial. We will only be looking at 2015-2019 data.
obesity_2015_raw = pd.read_csv("obesity_2015.csv")
obesity_2016_raw = pd.read_csv("obesity_2016.csv")
obesity_2017_raw = pd.read_csv("obesity_2017.csv")
obesity_2018_raw = pd.read_csv("obesity_2018.csv")
obesity_2019_raw = pd.read_csv("obesity_2019.csv")
median_income_raw = pd.read_excel("median_income.xlsx")
GDP_by_state_raw = pd.read_csv("GDP_by_state.csv")
GDP_by_state_pct_raw = pd.read_csv("GDP_by_state_pct.csv")
GDP_cap_2019_raw = pd.read_csv("gdp_cap_2019.csv", thousands=',')
GDP_cap_rest_raw = pd.read_csv("gdp_cap_2018_2015.csv", thousands=',')
avg_temp_raw = pd.read_csv("avg_temp.csv")
For obesity 2015 to 2019 dataset, only edition, measure name, state name, value are useful for our analysis column wise. For rows, only rows with measure name air pollution, high school graduration, physical inactivity, public health funding and obesity is meaningful for our purpose. Hence, we will be deleting the rest. Also, we will be removing rows with "United States" and "District of Columbia" as the State Name, since they are not states.
*** For 2015 High school graduation data, Idaho's value is missing, but the ranking is not. Hence, I have decided to utilize single imputation. Idaho is at rank 17, rank 16 has the value 85.5 and rank 18 has the value 85.0. I estimated the value for idaho as 85.3, which is the approximate middle value between rank 16 and 18, and used that throughout the tutorial.
obesity_2015_raw
| Edition | Report Type | Measure Name | State Name | Rank | Value | Score | Lower CI | Upper CI | Source | Source Year | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2015 | 2015 Annual | Air Pollution | Alabama | 34.0 | 9.5 | 0.00 | NaN | NaN | U.S. Environmental Protection Agency; U.S. Cen... | 2012-2014 |
| 1 | 2015 | 2015 Annual | Air Pollution | Alaska | 4.0 | 6.0 | -2.00 | NaN | NaN | U.S. Environmental Protection Agency; U.S. Cen... | 2012-2014 |
| 2 | 2015 | 2015 Annual | Air Pollution | Arizona | 37.0 | 9.7 | 0.12 | NaN | NaN | U.S. Environmental Protection Agency; U.S. Cen... | 2012-2014 |
| 3 | 2015 | 2015 Annual | Air Pollution | Arkansas | 37.0 | 9.7 | 0.12 | NaN | NaN | U.S. Environmental Protection Agency; U.S. Cen... | 2012-2014 |
| 4 | 2015 | 2015 Annual | Air Pollution | California | 50.0 | 12.5 | 1.78 | NaN | NaN | U.S. Environmental Protection Agency; U.S. Cen... | 2012-2014 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 43048 | 2015 | 2015 Annual | Youth Smoking | West Virginia | NaN | 19.6 | NaN | 22.7 | 16.8 | CDC, Youth Behavioral Risk Surveillance System | 2013 |
| 43049 | 2015 | 2015 Annual | Youth Smoking | Wisconsin | NaN | 11.8 | NaN | 14.1 | 9.9 | CDC, Youth Behavioral Risk Surveillance System | 2013 |
| 43050 | 2015 | 2015 Annual | Youth Smoking | Wyoming | NaN | 17.4 | NaN | 20.4 | 14.7 | CDC, Youth Behavioral Risk Surveillance System | 2013 |
| 43051 | 2015 | 2015 Annual | Youth Smoking | United States | NaN | 15.7 | NaN | 18.1 | 13.5 | CDC, Youth Behavioral Risk Surveillance System | 2013 |
| 43052 | 2015 | 2015 Annual | Youth Smoking | District of Columbia | NaN | NaN | NaN | NaN | NaN | CDC, Youth Behavioral Risk Surveillance System | 2013 |
43053 rows × 11 columns
obesity_2015 = obesity_2015_raw[["Edition","Measure Name","State Name", "Value"]]
obesity_2015 = obesity_2015.rename(columns={"Edition": "year", "Measure Name": "data_type", "State Name": "state", "Value": "value"})
obesity_2015
| year | data_type | state | value | |
|---|---|---|---|---|
| 0 | 2015 | Air Pollution | Alabama | 9.5 |
| 1 | 2015 | Air Pollution | Alaska | 6.0 |
| 2 | 2015 | Air Pollution | Arizona | 9.7 |
| 3 | 2015 | Air Pollution | Arkansas | 9.7 |
| 4 | 2015 | Air Pollution | California | 12.5 |
| ... | ... | ... | ... | ... |
| 43048 | 2015 | Youth Smoking | West Virginia | 19.6 |
| 43049 | 2015 | Youth Smoking | Wisconsin | 11.8 |
| 43050 | 2015 | Youth Smoking | Wyoming | 17.4 |
| 43051 | 2015 | Youth Smoking | United States | 15.7 |
| 43052 | 2015 | Youth Smoking | District of Columbia | NaN |
43053 rows × 4 columns
air_pollution_2015 = obesity_2015.copy()
air_pollution_2015 = air_pollution_2015[air_pollution_2015["data_type"] == "Air Pollution"]
air_pollution_2015.drop(air_pollution_2015[(air_pollution_2015.state == "United States") | (air_pollution_2015.state == "District of Columbia")].index, inplace=True)
air_pollution_2015
| year | data_type | state | value | |
|---|---|---|---|---|
| 0 | 2015 | Air Pollution | Alabama | 9.5 |
| 1 | 2015 | Air Pollution | Alaska | 6.0 |
| 2 | 2015 | Air Pollution | Arizona | 9.7 |
| 3 | 2015 | Air Pollution | Arkansas | 9.7 |
| 4 | 2015 | Air Pollution | California | 12.5 |
| 5 | 2015 | Air Pollution | Colorado | 7.0 |
| 6 | 2015 | Air Pollution | Connecticut | 8.8 |
| 7 | 2015 | Air Pollution | Delaware | 9.7 |
| 8 | 2015 | Air Pollution | Florida | 7.2 |
| 9 | 2015 | Air Pollution | Georgia | 9.8 |
| 10 | 2015 | Air Pollution | Hawaii | 7.6 |
| 11 | 2015 | Air Pollution | Idaho | 11.7 |
| 12 | 2015 | Air Pollution | Illinois | 11.1 |
| 13 | 2015 | Air Pollution | Indiana | 11.3 |
| 14 | 2015 | Air Pollution | Iowa | 9.3 |
| 15 | 2015 | Air Pollution | Kansas | 8.6 |
| 16 | 2015 | Air Pollution | Kentucky | 10.1 |
| 17 | 2015 | Air Pollution | Louisiana | 8.6 |
| 18 | 2015 | Air Pollution | Maine | 7.4 |
| 19 | 2015 | Air Pollution | Maryland | 9.6 |
| 20 | 2015 | Air Pollution | Massachusetts | 7.2 |
| 21 | 2015 | Air Pollution | Michigan | 8.8 |
| 22 | 2015 | Air Pollution | Minnesota | 8.0 |
| 23 | 2015 | Air Pollution | Mississippi | 8.9 |
| 24 | 2015 | Air Pollution | Missouri | 9.7 |
| 25 | 2015 | Air Pollution | Montana | 5.7 |
| 26 | 2015 | Air Pollution | Nebraska | 7.8 |
| 27 | 2015 | Air Pollution | Nevada | 10.0 |
| 28 | 2015 | Air Pollution | New Hampshire | 7.2 |
| 29 | 2015 | Air Pollution | New Jersey | 8.8 |
| 30 | 2015 | Air Pollution | New Mexico | 6.6 |
| 31 | 2015 | Air Pollution | New York | 8.0 |
| 32 | 2015 | Air Pollution | North Carolina | 8.7 |
| 33 | 2015 | Air Pollution | North Dakota | 5.2 |
| 34 | 2015 | Air Pollution | Ohio | 10.6 |
| 35 | 2015 | Air Pollution | Oklahoma | 9.5 |
| 36 | 2015 | Air Pollution | Oregon | 6.7 |
| 37 | 2015 | Air Pollution | Pennsylvania | 11.4 |
| 38 | 2015 | Air Pollution | Rhode Island | 7.8 |
| 39 | 2015 | Air Pollution | South Carolina | 9.0 |
| 40 | 2015 | Air Pollution | South Dakota | 6.3 |
| 41 | 2015 | Air Pollution | Tennessee | 9.1 |
| 42 | 2015 | Air Pollution | Texas | 9.9 |
| 43 | 2015 | Air Pollution | Utah | 8.9 |
| 44 | 2015 | Air Pollution | Vermont | 6.2 |
| 45 | 2015 | Air Pollution | Virginia | 8.3 |
| 46 | 2015 | Air Pollution | Washington | 8.0 |
| 47 | 2015 | Air Pollution | West Virginia | 9.4 |
| 48 | 2015 | Air Pollution | Wisconsin | 9.1 |
| 49 | 2015 | Air Pollution | Wyoming | 5.0 |
high_school_2015 = obesity_2015.copy()
high_school_2015 = high_school_2015[high_school_2015["data_type"] == "High School Graduation"]
high_school_2015.drop(high_school_2015[(high_school_2015.state == "United States") | (high_school_2015.state == "District of Columbia")].index, inplace=True)
high_school_2015 = high_school_2015.dropna()
high_school_2015
| year | data_type | state | value | |
|---|---|---|---|---|
| 25896 | 2015 | High School Graduation | Alabama | 80.0 |
| 25897 | 2015 | High School Graduation | Alaska | 71.8 |
| 25898 | 2015 | High School Graduation | Arizona | 75.1 |
| 25899 | 2015 | High School Graduation | Arkansas | 84.9 |
| 25900 | 2015 | High School Graduation | California | 80.4 |
| 25901 | 2015 | High School Graduation | Colorado | 76.9 |
| 25902 | 2015 | High School Graduation | Connecticut | 85.5 |
| 25903 | 2015 | High School Graduation | Delaware | 80.4 |
| 25904 | 2015 | High School Graduation | Florida | 75.6 |
| 25905 | 2015 | High School Graduation | Georgia | 71.7 |
| 25906 | 2015 | High School Graduation | Hawaii | 82.4 |
| 25907 | 2015 | High School Graduation | Idaho | 85.3 |
| 25908 | 2015 | High School Graduation | Illinois | 83.2 |
| 25909 | 2015 | High School Graduation | Indiana | 87.0 |
| 25910 | 2015 | High School Graduation | Iowa | 89.7 |
| 25911 | 2015 | High School Graduation | Kansas | 85.7 |
| 25912 | 2015 | High School Graduation | Kentucky | 86.1 |
| 25913 | 2015 | High School Graduation | Louisiana | 73.5 |
| 25914 | 2015 | High School Graduation | Maine | 86.4 |
| 25915 | 2015 | High School Graduation | Maryland | 85.0 |
| 25916 | 2015 | High School Graduation | Massachusetts | 85.0 |
| 25917 | 2015 | High School Graduation | Michigan | 77.0 |
| 25918 | 2015 | High School Graduation | Minnesota | 79.8 |
| 25919 | 2015 | High School Graduation | Mississippi | 75.5 |
| 25920 | 2015 | High School Graduation | Missouri | 85.7 |
| 25921 | 2015 | High School Graduation | Montana | 84.4 |
| 25922 | 2015 | High School Graduation | Nebraska | 88.5 |
| 25923 | 2015 | High School Graduation | Nevada | 70.7 |
| 25924 | 2015 | High School Graduation | New Hampshire | 87.3 |
| 25925 | 2015 | High School Graduation | New Jersey | 87.5 |
| 25926 | 2015 | High School Graduation | New Mexico | 70.3 |
| 25927 | 2015 | High School Graduation | New York | 76.8 |
| 25928 | 2015 | High School Graduation | North Carolina | 82.5 |
| 25929 | 2015 | High School Graduation | North Dakota | 87.5 |
| 25930 | 2015 | High School Graduation | Ohio | 82.2 |
| 25931 | 2015 | High School Graduation | Oklahoma | 84.8 |
| 25932 | 2015 | High School Graduation | Oregon | 68.7 |
| 25933 | 2015 | High School Graduation | Pennsylvania | 85.5 |
| 25934 | 2015 | High School Graduation | Rhode Island | 79.7 |
| 25935 | 2015 | High School Graduation | South Carolina | 77.6 |
| 25936 | 2015 | High School Graduation | South Dakota | 82.7 |
| 25937 | 2015 | High School Graduation | Tennessee | 86.3 |
| 25938 | 2015 | High School Graduation | Texas | 88.0 |
| 25939 | 2015 | High School Graduation | Utah | 83.0 |
| 25940 | 2015 | High School Graduation | Vermont | 86.6 |
| 25941 | 2015 | High School Graduation | Virginia | 84.5 |
| 25942 | 2015 | High School Graduation | Washington | 76.4 |
| 25943 | 2015 | High School Graduation | West Virginia | 81.4 |
| 25944 | 2015 | High School Graduation | Wisconsin | 88.0 |
| 25945 | 2015 | High School Graduation | Wyoming | 77.0 |
phys_2015 = obesity_2015.copy()
phys_2015 = phys_2015[phys_2015["data_type"] == "Physical Inactivity"]
phys_2015.drop(phys_2015[(phys_2015.state == "United States") | (phys_2015.state == "District of Columbia")].index, inplace=True)
phys_2015
| year | data_type | state | value | |
|---|---|---|---|---|
| 32549 | 2015 | Physical Inactivity | Alabama | 27.6 |
| 32550 | 2015 | Physical Inactivity | Alaska | 19.2 |
| 32551 | 2015 | Physical Inactivity | Arizona | 21.2 |
| 32552 | 2015 | Physical Inactivity | Arkansas | 30.7 |
| 32553 | 2015 | Physical Inactivity | California | 21.7 |
| 32554 | 2015 | Physical Inactivity | Colorado | 16.4 |
| 32555 | 2015 | Physical Inactivity | Connecticut | 20.6 |
| 32556 | 2015 | Physical Inactivity | Delaware | 24.9 |
| 32557 | 2015 | Physical Inactivity | Florida | 23.7 |
| 32558 | 2015 | Physical Inactivity | Georgia | 23.6 |
| 32559 | 2015 | Physical Inactivity | Hawaii | 19.6 |
| 32560 | 2015 | Physical Inactivity | Idaho | 18.7 |
| 32561 | 2015 | Physical Inactivity | Illinois | 23.9 |
| 32562 | 2015 | Physical Inactivity | Indiana | 26.1 |
| 32563 | 2015 | Physical Inactivity | Iowa | 22.6 |
| 32564 | 2015 | Physical Inactivity | Kansas | 23.8 |
| 32565 | 2015 | Physical Inactivity | Kentucky | 28.2 |
| 32566 | 2015 | Physical Inactivity | Louisiana | 29.5 |
| 32567 | 2015 | Physical Inactivity | Maine | 19.7 |
| 32568 | 2015 | Physical Inactivity | Maryland | 21.4 |
| 32569 | 2015 | Physical Inactivity | Massachusetts | 20.1 |
| 32570 | 2015 | Physical Inactivity | Michigan | 25.5 |
| 32571 | 2015 | Physical Inactivity | Minnesota | 20.2 |
| 32572 | 2015 | Physical Inactivity | Mississippi | 31.6 |
| 32573 | 2015 | Physical Inactivity | Missouri | 25.0 |
| 32574 | 2015 | Physical Inactivity | Montana | 19.6 |
| 32575 | 2015 | Physical Inactivity | Nebraska | 21.3 |
| 32576 | 2015 | Physical Inactivity | Nevada | 22.5 |
| 32577 | 2015 | Physical Inactivity | New Hampshire | 19.3 |
| 32578 | 2015 | Physical Inactivity | New Jersey | 23.3 |
| 32579 | 2015 | Physical Inactivity | New Mexico | 23.3 |
| 32580 | 2015 | Physical Inactivity | New York | 25.9 |
| 32581 | 2015 | Physical Inactivity | North Carolina | 23.2 |
| 32582 | 2015 | Physical Inactivity | North Dakota | 21.3 |
| 32583 | 2015 | Physical Inactivity | Ohio | 25.0 |
| 32584 | 2015 | Physical Inactivity | Oklahoma | 28.3 |
| 32585 | 2015 | Physical Inactivity | Oregon | 16.5 |
| 32586 | 2015 | Physical Inactivity | Pennsylvania | 23.3 |
| 32587 | 2015 | Physical Inactivity | Rhode Island | 22.5 |
| 32588 | 2015 | Physical Inactivity | South Carolina | 25.3 |
| 32589 | 2015 | Physical Inactivity | South Dakota | 21.2 |
| 32590 | 2015 | Physical Inactivity | Tennessee | 26.8 |
| 32591 | 2015 | Physical Inactivity | Texas | 27.6 |
| 32592 | 2015 | Physical Inactivity | Utah | 16.8 |
| 32593 | 2015 | Physical Inactivity | Vermont | 19.0 |
| 32594 | 2015 | Physical Inactivity | Virginia | 23.5 |
| 32595 | 2015 | Physical Inactivity | Washington | 18.1 |
| 32596 | 2015 | Physical Inactivity | West Virginia | 28.7 |
| 32597 | 2015 | Physical Inactivity | Wisconsin | 21.2 |
| 32598 | 2015 | Physical Inactivity | Wyoming | 22.1 |
health_fund_2015 = obesity_2015.copy()
health_fund_2015 = health_fund_2015[health_fund_2015["data_type"] == "Public Health Funding"]
health_fund_2015.drop(health_fund_2015[(health_fund_2015.state == "United States") | (health_fund_2015.state == "District of Columbia")].index, inplace=True)
health_fund_2015
| year | data_type | state | value | |
|---|---|---|---|---|
| 37229 | 2015 | Public Health Funding | Alabama | 111.0 |
| 37230 | 2015 | Public Health Funding | Alaska | 233.0 |
| 37231 | 2015 | Public Health Funding | Arizona | 44.0 |
| 37232 | 2015 | Public Health Funding | Arkansas | 98.0 |
| 37233 | 2015 | Public Health Funding | California | 103.0 |
| 37234 | 2015 | Public Health Funding | Colorado | 87.0 |
| 37235 | 2015 | Public Health Funding | Connecticut | 74.0 |
| 37236 | 2015 | Public Health Funding | Delaware | 105.0 |
| 37237 | 2015 | Public Health Funding | Florida | 58.0 |
| 37238 | 2015 | Public Health Funding | Georgia | 62.0 |
| 37239 | 2015 | Public Health Funding | Hawaii | 209.0 |
| 37240 | 2015 | Public Health Funding | Idaho | 135.0 |
| 37241 | 2015 | Public Health Funding | Illinois | 63.0 |
| 37242 | 2015 | Public Health Funding | Indiana | 44.0 |
| 37243 | 2015 | Public Health Funding | Iowa | 56.0 |
| 37244 | 2015 | Public Health Funding | Kansas | 47.0 |
| 37245 | 2015 | Public Health Funding | Kentucky | 77.0 |
| 37246 | 2015 | Public Health Funding | Louisiana | 71.0 |
| 37247 | 2015 | Public Health Funding | Maine | 83.0 |
| 37248 | 2015 | Public Health Funding | Maryland | 80.0 |
| 37249 | 2015 | Public Health Funding | Massachusetts | 105.0 |
| 37250 | 2015 | Public Health Funding | Michigan | 56.0 |
| 37251 | 2015 | Public Health Funding | Minnesota | 47.0 |
| 37252 | 2015 | Public Health Funding | Mississippi | 67.0 |
| 37253 | 2015 | Public Health Funding | Missouri | 46.0 |
| 37254 | 2015 | Public Health Funding | Montana | 96.0 |
| 37255 | 2015 | Public Health Funding | Nebraska | 81.0 |
| 37256 | 2015 | Public Health Funding | Nevada | 38.0 |
| 37257 | 2015 | Public Health Funding | New Hampshire | 61.0 |
| 37258 | 2015 | Public Health Funding | New Jersey | 60.0 |
| 37259 | 2015 | Public Health Funding | New Mexico | 117.0 |
| 37260 | 2015 | Public Health Funding | New York | 165.0 |
| 37261 | 2015 | Public Health Funding | North Carolina | 49.0 |
| 37262 | 2015 | Public Health Funding | North Dakota | 114.0 |
| 37263 | 2015 | Public Health Funding | Ohio | 46.0 |
| 37264 | 2015 | Public Health Funding | Oklahoma | 81.0 |
| 37265 | 2015 | Public Health Funding | Oregon | 60.0 |
| 37266 | 2015 | Public Health Funding | Pennsylvania | 51.0 |
| 37267 | 2015 | Public Health Funding | Rhode Island | 115.0 |
| 37268 | 2015 | Public Health Funding | South Carolina | 68.0 |
| 37269 | 2015 | Public Health Funding | South Dakota | 89.0 |
| 37270 | 2015 | Public Health Funding | Tennessee | 84.0 |
| 37271 | 2015 | Public Health Funding | Texas | 55.0 |
| 37272 | 2015 | Public Health Funding | Utah | 70.0 |
| 37273 | 2015 | Public Health Funding | Vermont | 106.0 |
| 37274 | 2015 | Public Health Funding | Virginia | 68.0 |
| 37275 | 2015 | Public Health Funding | Washington | 86.0 |
| 37276 | 2015 | Public Health Funding | West Virginia | 125.0 |
| 37277 | 2015 | Public Health Funding | Wisconsin | 43.0 |
| 37278 | 2015 | Public Health Funding | Wyoming | 101.0 |
obesity_2015 = obesity_2015[obesity_2015["data_type"] == "Obesity"]
obesity_2015.drop(obesity_2015[(obesity_2015.state == "United States") | (obesity_2015.state == "District of Columbia")].index, inplace=True)
obesity_2015
/opt/conda/lib/python3.8/site-packages/pandas/core/frame.py:4167: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy return super().drop(
| year | data_type | state | value | |
|---|---|---|---|---|
| 30989 | 2015 | Obesity | Alabama | 33.5 |
| 30990 | 2015 | Obesity | Alaska | 29.7 |
| 30991 | 2015 | Obesity | Arizona | 28.9 |
| 30992 | 2015 | Obesity | Arkansas | 35.9 |
| 30993 | 2015 | Obesity | California | 24.7 |
| 30994 | 2015 | Obesity | Colorado | 21.3 |
| 30995 | 2015 | Obesity | Connecticut | 26.3 |
| 30996 | 2015 | Obesity | Delaware | 30.7 |
| 30997 | 2015 | Obesity | Florida | 26.2 |
| 30998 | 2015 | Obesity | Georgia | 30.5 |
| 30999 | 2015 | Obesity | Hawaii | 22.1 |
| 31000 | 2015 | Obesity | Idaho | 28.9 |
| 31001 | 2015 | Obesity | Illinois | 29.3 |
| 31002 | 2015 | Obesity | Indiana | 32.7 |
| 31003 | 2015 | Obesity | Iowa | 30.9 |
| 31004 | 2015 | Obesity | Kansas | 31.3 |
| 31005 | 2015 | Obesity | Kentucky | 31.6 |
| 31006 | 2015 | Obesity | Louisiana | 34.9 |
| 31007 | 2015 | Obesity | Maine | 28.2 |
| 31008 | 2015 | Obesity | Maryland | 29.6 |
| 31009 | 2015 | Obesity | Massachusetts | 23.3 |
| 31010 | 2015 | Obesity | Michigan | 30.7 |
| 31011 | 2015 | Obesity | Minnesota | 27.6 |
| 31012 | 2015 | Obesity | Mississippi | 35.5 |
| 31013 | 2015 | Obesity | Missouri | 30.2 |
| 31014 | 2015 | Obesity | Montana | 26.4 |
| 31015 | 2015 | Obesity | Nebraska | 30.2 |
| 31016 | 2015 | Obesity | Nevada | 27.7 |
| 31017 | 2015 | Obesity | New Hampshire | 27.4 |
| 31018 | 2015 | Obesity | New Jersey | 26.9 |
| 31019 | 2015 | Obesity | New Mexico | 28.4 |
| 31020 | 2015 | Obesity | New York | 27.0 |
| 31021 | 2015 | Obesity | North Carolina | 29.7 |
| 31022 | 2015 | Obesity | North Dakota | 32.2 |
| 31023 | 2015 | Obesity | Ohio | 32.6 |
| 31024 | 2015 | Obesity | Oklahoma | 33.0 |
| 31025 | 2015 | Obesity | Oregon | 27.9 |
| 31026 | 2015 | Obesity | Pennsylvania | 30.2 |
| 31027 | 2015 | Obesity | Rhode Island | 27.0 |
| 31028 | 2015 | Obesity | South Carolina | 32.1 |
| 31029 | 2015 | Obesity | South Dakota | 29.8 |
| 31030 | 2015 | Obesity | Tennessee | 31.2 |
| 31031 | 2015 | Obesity | Texas | 31.9 |
| 31032 | 2015 | Obesity | Utah | 25.7 |
| 31033 | 2015 | Obesity | Vermont | 24.8 |
| 31034 | 2015 | Obesity | Virginia | 28.5 |
| 31035 | 2015 | Obesity | Washington | 27.3 |
| 31036 | 2015 | Obesity | West Virginia | 35.7 |
| 31037 | 2015 | Obesity | Wisconsin | 31.2 |
| 31038 | 2015 | Obesity | Wyoming | 29.5 |
obesity_2016 = obesity_2016_raw[["Edition","Measure Name","State Name", "Value"]]
obesity_2016 = obesity_2016.rename(columns={"Edition": "year", "Measure Name": "data_type", "State Name": "state", "Value": "value"})
obesity_2016
| year | data_type | state | value | |
|---|---|---|---|---|
| 0 | 2016 | Air Pollution | Alaska | 8.8 |
| 1 | 2016 | Air Pollution | Alabama | 9.1 |
| 2 | 2016 | Air Pollution | Arkansas | 7.5 |
| 3 | 2016 | Air Pollution | Arizona | 9.3 |
| 4 | 2016 | Air Pollution | California | 11.4 |
| ... | ... | ... | ... | ... |
| 51929 | 2016 | Water Fluoridation | Vermont | 56.3 |
| 51930 | 2016 | Water Fluoridation | Washington | 63.9 |
| 51931 | 2016 | Water Fluoridation | Wisconsin | 88.9 |
| 51932 | 2016 | Water Fluoridation | West Virginia | 90.5 |
| 51933 | 2016 | Water Fluoridation | Wyoming | 57.1 |
51934 rows × 4 columns
air_pollution_2016 = obesity_2016.copy()
air_pollution_2016 = air_pollution_2016[air_pollution_2016["data_type"] == "Air Pollution"]
air_pollution_2016.drop(air_pollution_2016[(air_pollution_2016.state == "United States") | (air_pollution_2016.state == "District of Columbia")].index, inplace=True)
air_pollution_2016
| year | data_type | state | value | |
|---|---|---|---|---|
| 0 | 2016 | Air Pollution | Alaska | 8.8 |
| 1 | 2016 | Air Pollution | Alabama | 9.1 |
| 2 | 2016 | Air Pollution | Arkansas | 7.5 |
| 3 | 2016 | Air Pollution | Arizona | 9.3 |
| 4 | 2016 | Air Pollution | California | 11.4 |
| 5 | 2016 | Air Pollution | Colorado | 6.6 |
| 6 | 2016 | Air Pollution | Connecticut | 8.8 |
| 7 | 2016 | Air Pollution | Delaware | 9.5 |
| 8 | 2016 | Air Pollution | Florida | 6.8 |
| 9 | 2016 | Air Pollution | Georgia | 9.1 |
| 10 | 2016 | Air Pollution | Hawaii | 7.0 |
| 11 | 2016 | Air Pollution | Iowa | 8.6 |
| 12 | 2016 | Air Pollution | Idaho | 8.5 |
| 13 | 2016 | Air Pollution | Illinois | 10.8 |
| 14 | 2016 | Air Pollution | Indiana | 10.5 |
| 15 | 2016 | Air Pollution | Kansas | 8.0 |
| 16 | 2016 | Air Pollution | Kentucky | 9.1 |
| 17 | 2016 | Air Pollution | Louisiana | 8.1 |
| 18 | 2016 | Air Pollution | Massachusetts | 6.4 |
| 19 | 2016 | Air Pollution | Maryland | 9.1 |
| 20 | 2016 | Air Pollution | Maine | 6.8 |
| 21 | 2016 | Air Pollution | Michigan | 8.6 |
| 22 | 2016 | Air Pollution | Minnesota | 8.0 |
| 23 | 2016 | Air Pollution | Missouri | 9.1 |
| 24 | 2016 | Air Pollution | Mississippi | 8.1 |
| 25 | 2016 | Air Pollution | Montana | 6.3 |
| 26 | 2016 | Air Pollution | North Carolina | 8.0 |
| 27 | 2016 | Air Pollution | North Dakota | 4.9 |
| 28 | 2016 | Air Pollution | Nebraska | 7.3 |
| 29 | 2016 | Air Pollution | New Hampshire | 6.6 |
| 30 | 2016 | Air Pollution | New Jersey | 8.8 |
| 31 | 2016 | Air Pollution | New Mexico | 6.0 |
| 32 | 2016 | Air Pollution | Nevada | 9.2 |
| 33 | 2016 | Air Pollution | New York | 7.5 |
| 34 | 2016 | Air Pollution | Ohio | 10.2 |
| 35 | 2016 | Air Pollution | Oklahoma | 8.7 |
| 36 | 2016 | Air Pollution | Oregon | 7.3 |
| 37 | 2016 | Air Pollution | Pennsylvania | 11.0 |
| 38 | 2016 | Air Pollution | Rhode Island | 7.5 |
| 39 | 2016 | Air Pollution | South Carolina | 7.9 |
| 40 | 2016 | Air Pollution | South Dakota | 6.3 |
| 41 | 2016 | Air Pollution | Tennessee | 8.6 |
| 42 | 2016 | Air Pollution | Texas | 9.4 |
| 43 | 2016 | Air Pollution | Utah | 9.2 |
| 44 | 2016 | Air Pollution | Virginia | 7.8 |
| 45 | 2016 | Air Pollution | Vermont | 5.6 |
| 46 | 2016 | Air Pollution | Washington | 8.3 |
| 47 | 2016 | Air Pollution | Wisconsin | 7.9 |
| 48 | 2016 | Air Pollution | West Virginia | 7.9 |
| 49 | 2016 | Air Pollution | Wyoming | 4.4 |
high_school_2016 = obesity_2016.copy()
high_school_2016 = high_school_2016[high_school_2016["data_type"] == "High School Graduation"]
high_school_2016.drop(high_school_2016[(high_school_2016.state == "United States") | (high_school_2016.state == "District of Columbia")].index, inplace=True)
high_school_2016
| year | data_type | state | value | |
|---|---|---|---|---|
| 32334 | 2016 | High School Graduation | Alabama | 86.3 |
| 32335 | 2016 | High School Graduation | Arizona | 75.7 |
| 32336 | 2016 | High School Graduation | California | 81.0 |
| 32338 | 2016 | High School Graduation | Florida | 76.1 |
| 32339 | 2016 | High School Graduation | Georgia | 72.5 |
| 32340 | 2016 | High School Graduation | Illinois | 86.0 |
| 32341 | 2016 | High School Graduation | Alaska | 71.1 |
| 32343 | 2016 | High School Graduation | Arkansas | 86.9 |
| 32344 | 2016 | High School Graduation | Colorado | 77.3 |
| 32345 | 2016 | High School Graduation | Connecticut | 87.0 |
| 32346 | 2016 | High School Graduation | Delaware | 87.0 |
| 32347 | 2016 | High School Graduation | Hawaii | 81.8 |
| 32348 | 2016 | High School Graduation | Iowa | 90.5 |
| 32349 | 2016 | High School Graduation | Idaho | 77.3 |
| 32350 | 2016 | High School Graduation | Kansas | 85.7 |
| 32351 | 2016 | High School Graduation | Kentucky | 87.5 |
| 32352 | 2016 | High School Graduation | Maryland | 86.4 |
| 32353 | 2016 | High School Graduation | Maine | 86.5 |
| 32354 | 2016 | High School Graduation | Missouri | 87.3 |
| 32355 | 2016 | High School Graduation | Mississippi | 77.6 |
| 32356 | 2016 | High School Graduation | North Carolina | 83.9 |
| 32357 | 2016 | High School Graduation | North Dakota | 87.2 |
| 32358 | 2016 | High School Graduation | New Jersey | 88.6 |
| 32359 | 2016 | High School Graduation | New Mexico | 68.5 |
| 32360 | 2016 | High School Graduation | Ohio | 81.8 |
| 32361 | 2016 | High School Graduation | Oklahoma | 82.7 |
| 32362 | 2016 | High School Graduation | Rhode Island | 80.8 |
| 32363 | 2016 | High School Graduation | South Carolina | 80.1 |
| 32364 | 2016 | High School Graduation | Tennessee | 87.2 |
| 32365 | 2016 | High School Graduation | Texas | 88.3 |
| 32366 | 2016 | High School Graduation | Vermont | 87.8 |
| 32367 | 2016 | High School Graduation | Washington | 78.2 |
| 32368 | 2016 | High School Graduation | Wyoming | 78.6 |
| 32369 | 2016 | High School Graduation | Indiana | 87.9 |
| 32370 | 2016 | High School Graduation | Louisiana | 74.6 |
| 32371 | 2016 | High School Graduation | Massachusetts | 86.1 |
| 32372 | 2016 | High School Graduation | Michigan | 78.6 |
| 32373 | 2016 | High School Graduation | Minnesota | 81.2 |
| 32374 | 2016 | High School Graduation | Montana | 85.4 |
| 32375 | 2016 | High School Graduation | Nebraska | 89.7 |
| 32376 | 2016 | High School Graduation | New Hampshire | 88.1 |
| 32377 | 2016 | High School Graduation | Wisconsin | 88.6 |
| 32378 | 2016 | High School Graduation | Nevada | 70.0 |
| 32379 | 2016 | High School Graduation | New York | 77.8 |
| 32380 | 2016 | High School Graduation | Oregon | 72.0 |
| 32381 | 2016 | High School Graduation | Pennsylvania | 85.3 |
| 32382 | 2016 | High School Graduation | South Dakota | 82.7 |
| 32383 | 2016 | High School Graduation | Utah | 83.9 |
| 32384 | 2016 | High School Graduation | Virginia | 85.3 |
| 32385 | 2016 | High School Graduation | West Virginia | 84.5 |
phys_2016 = obesity_2016.copy()
phys_2016 = phys_2016[phys_2016["data_type"] == "Physical Inactivity"]
phys_2016.drop(phys_2016[(phys_2016.state == "United States") | (phys_2016.state == "District of Columbia")].index, inplace=True)
phys_2016
| year | data_type | state | value | |
|---|---|---|---|---|
| 39976 | 2016 | Physical Inactivity | Alabama | 31.9 |
| 39977 | 2016 | Physical Inactivity | Alaska | 22.0 |
| 39978 | 2016 | Physical Inactivity | Arizona | 24.7 |
| 39979 | 2016 | Physical Inactivity | Arkansas | 34.2 |
| 39980 | 2016 | Physical Inactivity | California | 20.0 |
| 39981 | 2016 | Physical Inactivity | Colorado | 17.9 |
| 39982 | 2016 | Physical Inactivity | Connecticut | 23.5 |
| 39983 | 2016 | Physical Inactivity | Delaware | 29.4 |
| 39984 | 2016 | Physical Inactivity | Florida | 26.2 |
| 39985 | 2016 | Physical Inactivity | Georgia | 27.3 |
| 39986 | 2016 | Physical Inactivity | Hawaii | 22.5 |
| 39987 | 2016 | Physical Inactivity | Idaho | 21.2 |
| 39988 | 2016 | Physical Inactivity | Illinois | 24.8 |
| 39989 | 2016 | Physical Inactivity | Indiana | 29.4 |
| 39990 | 2016 | Physical Inactivity | Iowa | 26.3 |
| 39991 | 2016 | Physical Inactivity | Kansas | 26.5 |
| 39992 | 2016 | Physical Inactivity | Kentucky | 32.5 |
| 39993 | 2016 | Physical Inactivity | Louisiana | 31.9 |
| 39994 | 2016 | Physical Inactivity | Maine | 24.8 |
| 39995 | 2016 | Physical Inactivity | Maryland | 24.1 |
| 39996 | 2016 | Physical Inactivity | Massachusetts | 26.5 |
| 39997 | 2016 | Physical Inactivity | Michigan | 25.5 |
| 39998 | 2016 | Physical Inactivity | Minnesota | 21.8 |
| 39999 | 2016 | Physical Inactivity | Mississippi | 36.8 |
| 40000 | 2016 | Physical Inactivity | Missouri | 27.0 |
| 40001 | 2016 | Physical Inactivity | Montana | 22.5 |
| 40002 | 2016 | Physical Inactivity | Nebraska | 25.3 |
| 40003 | 2016 | Physical Inactivity | Nevada | 24.7 |
| 40004 | 2016 | Physical Inactivity | New Hampshire | 22.6 |
| 40005 | 2016 | Physical Inactivity | New Jersey | 27.2 |
| 40006 | 2016 | Physical Inactivity | New Mexico | 22.6 |
| 40007 | 2016 | Physical Inactivity | New York | 29.3 |
| 40008 | 2016 | Physical Inactivity | North Carolina | 26.2 |
| 40009 | 2016 | Physical Inactivity | North Dakota | 26.8 |
| 40010 | 2016 | Physical Inactivity | Ohio | 27.0 |
| 40011 | 2016 | Physical Inactivity | Oklahoma | 33.2 |
| 40012 | 2016 | Physical Inactivity | Oregon | 18.8 |
| 40013 | 2016 | Physical Inactivity | Pennsylvania | 27.8 |
| 40014 | 2016 | Physical Inactivity | Rhode Island | 28.1 |
| 40015 | 2016 | Physical Inactivity | South Carolina | 26.7 |
| 40016 | 2016 | Physical Inactivity | South Dakota | 21.5 |
| 40017 | 2016 | Physical Inactivity | Tennessee | 30.4 |
| 40018 | 2016 | Physical Inactivity | Texas | 29.5 |
| 40019 | 2016 | Physical Inactivity | Utah | 20.3 |
| 40020 | 2016 | Physical Inactivity | Vermont | 22.2 |
| 40021 | 2016 | Physical Inactivity | Virginia | 25.1 |
| 40022 | 2016 | Physical Inactivity | Washington | 19.0 |
| 40023 | 2016 | Physical Inactivity | West Virginia | 30.8 |
| 40024 | 2016 | Physical Inactivity | Wisconsin | 21.6 |
| 40025 | 2016 | Physical Inactivity | Wyoming | 26.2 |
health_fund_2016 = obesity_2016.copy()
health_fund_2016 = health_fund_2016[health_fund_2016["data_type"] == "Public Health Funding"]
health_fund_2016.drop(health_fund_2016[(health_fund_2016.state == "United States") | (health_fund_2016.state == "District of Columbia")].index, inplace=True)
health_fund_2016
| year | data_type | state | value | |
|---|---|---|---|---|
| 44758 | 2016 | Public Health Funding | Alabama | 111.0 |
| 44759 | 2016 | Public Health Funding | Alaska | 267.0 |
| 44760 | 2016 | Public Health Funding | Arizona | 48.0 |
| 44761 | 2016 | Public Health Funding | Arkansas | 104.0 |
| 44762 | 2016 | Public Health Funding | California | 103.0 |
| 44763 | 2016 | Public Health Funding | Colorado | 94.0 |
| 44764 | 2016 | Public Health Funding | Connecticut | 80.0 |
| 44765 | 2016 | Public Health Funding | Delaware | 106.0 |
| 44766 | 2016 | Public Health Funding | Florida | 62.0 |
| 44767 | 2016 | Public Health Funding | Georgia | 69.0 |
| 44768 | 2016 | Public Health Funding | Hawaii | 237.0 |
| 44769 | 2016 | Public Health Funding | Idaho | 147.0 |
| 44770 | 2016 | Public Health Funding | Illinois | 70.0 |
| 44771 | 2016 | Public Health Funding | Indiana | 46.0 |
| 44772 | 2016 | Public Health Funding | Iowa | 92.0 |
| 44773 | 2016 | Public Health Funding | Kansas | 53.0 |
| 44774 | 2016 | Public Health Funding | Kentucky | 77.0 |
| 44775 | 2016 | Public Health Funding | Louisiana | 81.0 |
| 44776 | 2016 | Public Health Funding | Maine | 90.0 |
| 44777 | 2016 | Public Health Funding | Maryland | 95.0 |
| 44778 | 2016 | Public Health Funding | Massachusetts | 107.0 |
| 44779 | 2016 | Public Health Funding | Michigan | 59.0 |
| 44780 | 2016 | Public Health Funding | Minnesota | 73.0 |
| 44781 | 2016 | Public Health Funding | Mississippi | 73.0 |
| 44782 | 2016 | Public Health Funding | Missouri | 50.0 |
| 44783 | 2016 | Public Health Funding | Montana | 105.0 |
| 44784 | 2016 | Public Health Funding | Nebraska | 88.0 |
| 44785 | 2016 | Public Health Funding | Nevada | 39.0 |
| 44786 | 2016 | Public Health Funding | New Hampshire | 70.0 |
| 44787 | 2016 | Public Health Funding | New Jersey | 63.0 |
| 44788 | 2016 | Public Health Funding | New Mexico | 123.0 |
| 44789 | 2016 | Public Health Funding | New York | 160.0 |
| 44790 | 2016 | Public Health Funding | North Carolina | 54.0 |
| 44791 | 2016 | Public Health Funding | North Dakota | 128.0 |
| 44792 | 2016 | Public Health Funding | Ohio | 50.0 |
| 44793 | 2016 | Public Health Funding | Oklahoma | 89.0 |
| 44794 | 2016 | Public Health Funding | Oregon | 67.0 |
| 44795 | 2016 | Public Health Funding | Pennsylvania | 56.0 |
| 44796 | 2016 | Public Health Funding | Rhode Island | 130.0 |
| 44797 | 2016 | Public Health Funding | South Carolina | 71.0 |
| 44798 | 2016 | Public Health Funding | South Dakota | 100.0 |
| 44799 | 2016 | Public Health Funding | Tennessee | 90.0 |
| 44800 | 2016 | Public Health Funding | Texas | 63.0 |
| 44801 | 2016 | Public Health Funding | Utah | 74.0 |
| 44802 | 2016 | Public Health Funding | Vermont | 121.0 |
| 44803 | 2016 | Public Health Funding | Virginia | 70.0 |
| 44804 | 2016 | Public Health Funding | Washington | 91.0 |
| 44805 | 2016 | Public Health Funding | West Virginia | 211.0 |
| 44806 | 2016 | Public Health Funding | Wisconsin | 47.0 |
| 44807 | 2016 | Public Health Funding | Wyoming | 108.0 |
obesity_2016 = obesity_2016[obesity_2016["data_type"] == "Obesity"]
obesity_2016.drop(obesity_2016[(obesity_2016.state == "United States") | (obesity_2016.state == "District of Columbia")].index, inplace=True)
obesity_2016
/opt/conda/lib/python3.8/site-packages/pandas/core/frame.py:4167: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy return super().drop(
| year | data_type | state | value | |
|---|---|---|---|---|
| 38314 | 2016 | Obesity | Alabama | 35.6 |
| 38315 | 2016 | Obesity | Alaska | 29.8 |
| 38316 | 2016 | Obesity | Arizona | 28.4 |
| 38317 | 2016 | Obesity | Arkansas | 34.5 |
| 38318 | 2016 | Obesity | California | 24.2 |
| 38319 | 2016 | Obesity | Colorado | 20.2 |
| 38320 | 2016 | Obesity | Connecticut | 25.3 |
| 38321 | 2016 | Obesity | Delaware | 29.7 |
| 38322 | 2016 | Obesity | Florida | 26.8 |
| 38323 | 2016 | Obesity | Georgia | 30.7 |
| 38324 | 2016 | Obesity | Hawaii | 22.7 |
| 38325 | 2016 | Obesity | Idaho | 28.6 |
| 38326 | 2016 | Obesity | Illinois | 30.8 |
| 38327 | 2016 | Obesity | Indiana | 31.3 |
| 38328 | 2016 | Obesity | Iowa | 32.1 |
| 38329 | 2016 | Obesity | Kansas | 34.2 |
| 38330 | 2016 | Obesity | Kentucky | 34.6 |
| 38331 | 2016 | Obesity | Louisiana | 36.2 |
| 38332 | 2016 | Obesity | Maine | 30.0 |
| 38333 | 2016 | Obesity | Maryland | 28.9 |
| 38334 | 2016 | Obesity | Massachusetts | 24.3 |
| 38335 | 2016 | Obesity | Michigan | 31.2 |
| 38336 | 2016 | Obesity | Minnesota | 26.1 |
| 38337 | 2016 | Obesity | Mississippi | 35.6 |
| 38338 | 2016 | Obesity | Missouri | 32.4 |
| 38339 | 2016 | Obesity | Montana | 23.6 |
| 38340 | 2016 | Obesity | Nebraska | 31.4 |
| 38341 | 2016 | Obesity | Nevada | 26.7 |
| 38342 | 2016 | Obesity | New Hampshire | 26.3 |
| 38343 | 2016 | Obesity | New Jersey | 25.6 |
| 38344 | 2016 | Obesity | New Mexico | 28.8 |
| 38345 | 2016 | Obesity | New York | 25.0 |
| 38346 | 2016 | Obesity | North Carolina | 30.1 |
| 38347 | 2016 | Obesity | North Dakota | 31.0 |
| 38348 | 2016 | Obesity | Ohio | 29.8 |
| 38349 | 2016 | Obesity | Oklahoma | 33.9 |
| 38350 | 2016 | Obesity | Oregon | 30.1 |
| 38351 | 2016 | Obesity | Pennsylvania | 30.0 |
| 38352 | 2016 | Obesity | Rhode Island | 26.0 |
| 38353 | 2016 | Obesity | South Carolina | 31.7 |
| 38354 | 2016 | Obesity | South Dakota | 30.4 |
| 38355 | 2016 | Obesity | Tennessee | 33.8 |
| 38356 | 2016 | Obesity | Texas | 32.4 |
| 38357 | 2016 | Obesity | Utah | 24.5 |
| 38358 | 2016 | Obesity | Vermont | 25.1 |
| 38359 | 2016 | Obesity | Virginia | 29.2 |
| 38360 | 2016 | Obesity | Washington | 26.4 |
| 38361 | 2016 | Obesity | West Virginia | 35.6 |
| 38362 | 2016 | Obesity | Wisconsin | 30.7 |
| 38363 | 2016 | Obesity | Wyoming | 29.0 |
obesity_2017 = obesity_2017_raw[["Edition","Measure Name","State Name", "Value"]]
obesity_2017 = obesity_2017.rename(columns={"Edition": "year", "Measure Name": "data_type", "State Name": "state", "Value": "value"})
obesity_2017
| year | data_type | state | value | |
|---|---|---|---|---|
| 0 | 2017 | Air Pollution | Alaska | 8.7 |
| 1 | 2017 | Air Pollution | Alabama | 8.9 |
| 2 | 2017 | Air Pollution | Arkansas | 7.2 |
| 3 | 2017 | Air Pollution | Arizona | 9.7 |
| 4 | 2017 | Air Pollution | California | 11.7 |
| ... | ... | ... | ... | ... |
| 55309 | 2017 | Water Fluoridation | Vermont | 56.3 |
| 55310 | 2017 | Water Fluoridation | Washington | 63.9 |
| 55311 | 2017 | Water Fluoridation | Wisconsin | 88.9 |
| 55312 | 2017 | Water Fluoridation | West Virginia | 90.5 |
| 55313 | 2017 | Water Fluoridation | Wyoming | 57.1 |
55314 rows × 4 columns
air_pollution_2017 = obesity_2017.copy()
air_pollution_2017 = air_pollution_2017[air_pollution_2017["data_type"] == "Air Pollution"]
air_pollution_2017.drop(air_pollution_2017[(air_pollution_2017.state == "United States") | (air_pollution_2017.state == "District of Columbia")].index, inplace=True)
air_pollution_2017
| year | data_type | state | value | |
|---|---|---|---|---|
| 0 | 2017 | Air Pollution | Alaska | 8.7 |
| 1 | 2017 | Air Pollution | Alabama | 8.9 |
| 2 | 2017 | Air Pollution | Arkansas | 7.2 |
| 3 | 2017 | Air Pollution | Arizona | 9.7 |
| 4 | 2017 | Air Pollution | California | 11.7 |
| 5 | 2017 | Air Pollution | Colorado | 6.6 |
| 6 | 2017 | Air Pollution | Connecticut | 8.6 |
| 7 | 2017 | Air Pollution | Delaware | 9.1 |
| 8 | 2017 | Air Pollution | Florida | 6.8 |
| 9 | 2017 | Air Pollution | Georgia | 9.0 |
| 10 | 2017 | Air Pollution | Hawaii | 5.9 |
| 11 | 2017 | Air Pollution | Iowa | 7.8 |
| 12 | 2017 | Air Pollution | Idaho | 5.9 |
| 13 | 2017 | Air Pollution | Illinois | 10.2 |
| 14 | 2017 | Air Pollution | Indiana | 9.7 |
| 15 | 2017 | Air Pollution | Kansas | 7.3 |
| 16 | 2017 | Air Pollution | Kentucky | 8.8 |
| 17 | 2017 | Air Pollution | Louisiana | 7.8 |
| 18 | 2017 | Air Pollution | Massachusetts | 6.2 |
| 19 | 2017 | Air Pollution | Maryland | 9.0 |
| 20 | 2017 | Air Pollution | Maine | 6.4 |
| 21 | 2017 | Air Pollution | Michigan | 8.7 |
| 22 | 2017 | Air Pollution | Minnesota | 7.5 |
| 23 | 2017 | Air Pollution | Missouri | 8.3 |
| 24 | 2017 | Air Pollution | Mississippi | 7.5 |
| 25 | 2017 | Air Pollution | Montana | 6.0 |
| 26 | 2017 | Air Pollution | North Carolina | 7.8 |
| 27 | 2017 | Air Pollution | North Dakota | 4.2 |
| 28 | 2017 | Air Pollution | Nebraska | 7.0 |
| 29 | 2017 | Air Pollution | New Hampshire | 5.9 |
| 30 | 2017 | Air Pollution | New Jersey | 8.5 |
| 31 | 2017 | Air Pollution | New Mexico | 5.7 |
| 32 | 2017 | Air Pollution | Nevada | 9.1 |
| 33 | 2017 | Air Pollution | New York | 7.2 |
| 34 | 2017 | Air Pollution | Ohio | 9.6 |
| 35 | 2017 | Air Pollution | Oklahoma | 8.1 |
| 36 | 2017 | Air Pollution | Oregon | 6.8 |
| 37 | 2017 | Air Pollution | Pennsylvania | 10.1 |
| 38 | 2017 | Air Pollution | Rhode Island | 7.5 |
| 39 | 2017 | Air Pollution | South Carolina | 7.8 |
| 40 | 2017 | Air Pollution | South Dakota | 5.5 |
| 41 | 2017 | Air Pollution | Tennessee | 8.2 |
| 42 | 2017 | Air Pollution | Texas | 8.9 |
| 43 | 2017 | Air Pollution | Utah | 8.1 |
| 44 | 2017 | Air Pollution | Virginia | 7.5 |
| 45 | 2017 | Air Pollution | Vermont | 5.5 |
| 46 | 2017 | Air Pollution | Washington | 7.8 |
| 47 | 2017 | Air Pollution | Wisconsin | 7.4 |
| 48 | 2017 | Air Pollution | West Virginia | 7.7 |
| 49 | 2017 | Air Pollution | Wyoming | 3.8 |
high_school_2017 = obesity_2017.copy()
high_school_2017 = high_school_2017[high_school_2017["data_type"] == "High School Graduation"]
high_school_2017.drop(high_school_2017[(high_school_2017.state == "United States") | (high_school_2017.state == "District of Columbia")].index, inplace=True)
high_school_2017
| year | data_type | state | value | |
|---|---|---|---|---|
| 34050 | 2017 | High School Graduation | Alabama | 89.3 |
| 34051 | 2017 | High School Graduation | Alaska | 75.6 |
| 34052 | 2017 | High School Graduation | Arizona | 77.4 |
| 34053 | 2017 | High School Graduation | Arkansas | 84.9 |
| 34054 | 2017 | High School Graduation | California | 82.0 |
| 34055 | 2017 | High School Graduation | Colorado | 77.3 |
| 34056 | 2017 | High School Graduation | Connecticut | 87.2 |
| 34057 | 2017 | High School Graduation | Delaware | 85.6 |
| 34058 | 2017 | High School Graduation | Florida | 77.9 |
| 34059 | 2017 | High School Graduation | Georgia | 78.8 |
| 34060 | 2017 | High School Graduation | Hawaii | 81.6 |
| 34061 | 2017 | High School Graduation | Idaho | 78.9 |
| 34062 | 2017 | High School Graduation | Illinois | 85.6 |
| 34063 | 2017 | High School Graduation | Indiana | 87.1 |
| 34064 | 2017 | High School Graduation | Iowa | 90.8 |
| 34065 | 2017 | High School Graduation | Kansas | 85.7 |
| 34066 | 2017 | High School Graduation | Kentucky | 88.0 |
| 34067 | 2017 | High School Graduation | Louisiana | 77.5 |
| 34068 | 2017 | High School Graduation | Maine | 87.5 |
| 34069 | 2017 | High School Graduation | Maryland | 87.0 |
| 34070 | 2017 | High School Graduation | Massachusetts | 87.3 |
| 34071 | 2017 | High School Graduation | Michigan | 79.8 |
| 34072 | 2017 | High School Graduation | Minnesota | 81.9 |
| 34073 | 2017 | High School Graduation | Mississippi | 75.4 |
| 34074 | 2017 | High School Graduation | Missouri | 87.8 |
| 34075 | 2017 | High School Graduation | Montana | 86.0 |
| 34076 | 2017 | High School Graduation | Nebraska | 88.9 |
| 34077 | 2017 | High School Graduation | Nevada | 71.3 |
| 34078 | 2017 | High School Graduation | New Hampshire | 88.1 |
| 34079 | 2017 | High School Graduation | New Jersey | 89.7 |
| 34080 | 2017 | High School Graduation | New Mexico | 68.6 |
| 34081 | 2017 | High School Graduation | New York | 79.2 |
| 34082 | 2017 | High School Graduation | North Carolina | 85.6 |
| 34083 | 2017 | High School Graduation | North Dakota | 86.6 |
| 34084 | 2017 | High School Graduation | Ohio | 80.7 |
| 34085 | 2017 | High School Graduation | Oklahoma | 82.5 |
| 34086 | 2017 | High School Graduation | Oregon | 73.8 |
| 34087 | 2017 | High School Graduation | Pennsylvania | 84.8 |
| 34088 | 2017 | High School Graduation | Rhode Island | 83.2 |
| 34089 | 2017 | High School Graduation | South Carolina | 80.3 |
| 34090 | 2017 | High School Graduation | South Dakota | 83.9 |
| 34091 | 2017 | High School Graduation | Tennessee | 87.9 |
| 34092 | 2017 | High School Graduation | Texas | 89.0 |
| 34093 | 2017 | High School Graduation | Utah | 84.8 |
| 34094 | 2017 | High School Graduation | Vermont | 87.7 |
| 34095 | 2017 | High School Graduation | Virginia | 85.7 |
| 34096 | 2017 | High School Graduation | Washington | 78.2 |
| 34097 | 2017 | High School Graduation | West Virginia | 86.5 |
| 34098 | 2017 | High School Graduation | Wisconsin | 88.4 |
| 34099 | 2017 | High School Graduation | Wyoming | 79.3 |
phys_2017 = obesity_2017.copy()
phys_2017 = phys_2017[phys_2017["data_type"] == "Physical Inactivity"]
phys_2017.drop(phys_2017[(phys_2017.state == "United States") | (phys_2017.state == "District of Columbia")].index, inplace=True)
phys_2017
| year | data_type | state | value | |
|---|---|---|---|---|
| 41900 | 2017 | Physical Inactivity | Alabama | 29.4 |
| 41901 | 2017 | Physical Inactivity | Alaska | 19.1 |
| 41902 | 2017 | Physical Inactivity | Arizona | 23.1 |
| 41903 | 2017 | Physical Inactivity | Arkansas | 32.5 |
| 41904 | 2017 | Physical Inactivity | California | 20.5 |
| 41905 | 2017 | Physical Inactivity | Colorado | 15.8 |
| 41906 | 2017 | Physical Inactivity | Connecticut | 21.3 |
| 41907 | 2017 | Physical Inactivity | Delaware | 26.6 |
| 41908 | 2017 | Physical Inactivity | Florida | 29.8 |
| 41909 | 2017 | Physical Inactivity | Georgia | 29.4 |
| 41910 | 2017 | Physical Inactivity | Hawaii | 20.8 |
| 41911 | 2017 | Physical Inactivity | Idaho | 20.2 |
| 41912 | 2017 | Physical Inactivity | Illinois | 23.9 |
| 41913 | 2017 | Physical Inactivity | Indiana | 26.8 |
| 41914 | 2017 | Physical Inactivity | Iowa | 22.7 |
| 41915 | 2017 | Physical Inactivity | Kansas | 23.5 |
| 41916 | 2017 | Physical Inactivity | Kentucky | 29.8 |
| 41917 | 2017 | Physical Inactivity | Louisiana | 29.1 |
| 41918 | 2017 | Physical Inactivity | Maine | 20.6 |
| 41919 | 2017 | Physical Inactivity | Maryland | 23.1 |
| 41920 | 2017 | Physical Inactivity | Massachusetts | 20.0 |
| 41921 | 2017 | Physical Inactivity | Michigan | 23.9 |
| 41922 | 2017 | Physical Inactivity | Minnesota | 18.0 |
| 41923 | 2017 | Physical Inactivity | Mississippi | 30.3 |
| 41924 | 2017 | Physical Inactivity | Missouri | 24.9 |
| 41925 | 2017 | Physical Inactivity | Montana | 19.9 |
| 41926 | 2017 | Physical Inactivity | Nebraska | 22.4 |
| 41927 | 2017 | Physical Inactivity | Nevada | 24.7 |
| 41928 | 2017 | Physical Inactivity | New Hampshire | 19.3 |
| 41929 | 2017 | Physical Inactivity | New Jersey | 29.8 |
| 41930 | 2017 | Physical Inactivity | New Mexico | 20.3 |
| 41931 | 2017 | Physical Inactivity | New York | 26.3 |
| 41932 | 2017 | Physical Inactivity | North Carolina | 23.3 |
| 41933 | 2017 | Physical Inactivity | North Dakota | 22.2 |
| 41934 | 2017 | Physical Inactivity | Ohio | 25.9 |
| 41935 | 2017 | Physical Inactivity | Oklahoma | 28.5 |
| 41936 | 2017 | Physical Inactivity | Oregon | 17.2 |
| 41937 | 2017 | Physical Inactivity | Pennsylvania | 22.9 |
| 41938 | 2017 | Physical Inactivity | Rhode Island | 24.4 |
| 41939 | 2017 | Physical Inactivity | South Carolina | 26.9 |
| 41940 | 2017 | Physical Inactivity | South Dakota | 18.9 |
| 41941 | 2017 | Physical Inactivity | Tennessee | 28.4 |
| 41942 | 2017 | Physical Inactivity | Texas | 25.2 |
| 41943 | 2017 | Physical Inactivity | Utah | 15.7 |
| 41944 | 2017 | Physical Inactivity | Vermont | 19.5 |
| 41945 | 2017 | Physical Inactivity | Virginia | 23.3 |
| 41946 | 2017 | Physical Inactivity | Washington | 17.6 |
| 41947 | 2017 | Physical Inactivity | West Virginia | 28.5 |
| 41948 | 2017 | Physical Inactivity | Wisconsin | 20.0 |
| 41949 | 2017 | Physical Inactivity | Wyoming | 23.1 |
health_fund_2017 = obesity_2017.copy()
health_fund_2017 = health_fund_2017[health_fund_2017["data_type"] == "Public Health Funding"]
health_fund_2017.drop(health_fund_2017[(health_fund_2017.state == "United States") | (health_fund_2017.state == "District of Columbia")].index, inplace=True)
health_fund_2017
| year | data_type | state | value | |
|---|---|---|---|---|
| 46838 | 2017 | Public Health Funding | Alabama | 112.0 |
| 46839 | 2017 | Public Health Funding | Alaska | 285.0 |
| 46840 | 2017 | Public Health Funding | Arizona | 50.0 |
| 46841 | 2017 | Public Health Funding | Arkansas | 108.0 |
| 46842 | 2017 | Public Health Funding | California | 103.0 |
| 46843 | 2017 | Public Health Funding | Colorado | 98.0 |
| 46844 | 2017 | Public Health Funding | Connecticut | 82.0 |
| 46845 | 2017 | Public Health Funding | Delaware | 107.0 |
| 46846 | 2017 | Public Health Funding | Florida | 63.0 |
| 46847 | 2017 | Public Health Funding | Georgia | 72.0 |
| 46848 | 2017 | Public Health Funding | Hawaii | 257.0 |
| 46849 | 2017 | Public Health Funding | Idaho | 153.0 |
| 46850 | 2017 | Public Health Funding | Illinois | 71.0 |
| 46851 | 2017 | Public Health Funding | Indiana | 49.0 |
| 46852 | 2017 | Public Health Funding | Iowa | 127.0 |
| 46853 | 2017 | Public Health Funding | Kansas | 56.0 |
| 46854 | 2017 | Public Health Funding | Kentucky | 79.0 |
| 46855 | 2017 | Public Health Funding | Louisiana | 86.0 |
| 46856 | 2017 | Public Health Funding | Maine | 95.0 |
| 46857 | 2017 | Public Health Funding | Maryland | 95.0 |
| 46858 | 2017 | Public Health Funding | Massachusetts | 108.0 |
| 46859 | 2017 | Public Health Funding | Michigan | 59.0 |
| 46860 | 2017 | Public Health Funding | Minnesota | 94.0 |
| 46861 | 2017 | Public Health Funding | Mississippi | 77.0 |
| 46862 | 2017 | Public Health Funding | Missouri | 53.0 |
| 46863 | 2017 | Public Health Funding | Montana | 111.0 |
| 46864 | 2017 | Public Health Funding | Nebraska | 95.0 |
| 46865 | 2017 | Public Health Funding | Nevada | 41.0 |
| 46866 | 2017 | Public Health Funding | New Hampshire | 71.0 |
| 46867 | 2017 | Public Health Funding | New Jersey | 65.0 |
| 46868 | 2017 | Public Health Funding | New Mexico | 126.0 |
| 46869 | 2017 | Public Health Funding | New York | 153.0 |
| 46870 | 2017 | Public Health Funding | North Carolina | 56.0 |
| 46871 | 2017 | Public Health Funding | North Dakota | 147.0 |
| 46872 | 2017 | Public Health Funding | Ohio | 53.0 |
| 46873 | 2017 | Public Health Funding | Oklahoma | 89.0 |
| 46874 | 2017 | Public Health Funding | Oregon | 76.0 |
| 46875 | 2017 | Public Health Funding | Pennsylvania | 57.0 |
| 46876 | 2017 | Public Health Funding | Rhode Island | 140.0 |
| 46877 | 2017 | Public Health Funding | South Carolina | 74.0 |
| 46878 | 2017 | Public Health Funding | South Dakota | 105.0 |
| 46879 | 2017 | Public Health Funding | Tennessee | 94.0 |
| 46880 | 2017 | Public Health Funding | Texas | 67.0 |
| 46881 | 2017 | Public Health Funding | Utah | 77.0 |
| 46882 | 2017 | Public Health Funding | Vermont | 138.0 |
| 46883 | 2017 | Public Health Funding | Virginia | 72.0 |
| 46884 | 2017 | Public Health Funding | Washington | 93.0 |
| 46885 | 2017 | Public Health Funding | West Virginia | 296.0 |
| 46886 | 2017 | Public Health Funding | Wisconsin | 50.0 |
| 46887 | 2017 | Public Health Funding | Wyoming | 110.0 |
obesity_2017 = obesity_2017[obesity_2017["data_type"] == "Obesity"]
obesity_2017.drop(obesity_2017[(obesity_2017.state == "United States") | (obesity_2017.state == "District of Columbia")].index, inplace=True)
obesity_2017
/opt/conda/lib/python3.8/site-packages/pandas/core/frame.py:4167: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy return super().drop(
| year | data_type | state | value | |
|---|---|---|---|---|
| 40394 | 2017 | Obesity | Alabama | 35.7 |
| 40395 | 2017 | Obesity | Alaska | 31.4 |
| 40396 | 2017 | Obesity | Arizona | 29.0 |
| 40397 | 2017 | Obesity | Arkansas | 35.7 |
| 40398 | 2017 | Obesity | California | 25.0 |
| 40399 | 2017 | Obesity | Colorado | 22.3 |
| 40400 | 2017 | Obesity | Connecticut | 26.0 |
| 40401 | 2017 | Obesity | Delaware | 30.7 |
| 40402 | 2017 | Obesity | Florida | 27.4 |
| 40403 | 2017 | Obesity | Georgia | 31.4 |
| 40404 | 2017 | Obesity | Hawaii | 23.8 |
| 40405 | 2017 | Obesity | Idaho | 27.4 |
| 40406 | 2017 | Obesity | Illinois | 31.6 |
| 40407 | 2017 | Obesity | Indiana | 32.5 |
| 40408 | 2017 | Obesity | Iowa | 32.0 |
| 40409 | 2017 | Obesity | Kansas | 31.2 |
| 40410 | 2017 | Obesity | Kentucky | 34.2 |
| 40411 | 2017 | Obesity | Louisiana | 35.5 |
| 40412 | 2017 | Obesity | Maine | 29.9 |
| 40413 | 2017 | Obesity | Maryland | 29.9 |
| 40414 | 2017 | Obesity | Massachusetts | 23.6 |
| 40415 | 2017 | Obesity | Michigan | 32.5 |
| 40416 | 2017 | Obesity | Minnesota | 27.8 |
| 40417 | 2017 | Obesity | Mississippi | 37.3 |
| 40418 | 2017 | Obesity | Missouri | 31.7 |
| 40419 | 2017 | Obesity | Montana | 25.5 |
| 40420 | 2017 | Obesity | Nebraska | 32.0 |
| 40421 | 2017 | Obesity | Nevada | 25.8 |
| 40422 | 2017 | Obesity | New Hampshire | 26.6 |
| 40423 | 2017 | Obesity | New Jersey | 27.3 |
| 40424 | 2017 | Obesity | New Mexico | 28.3 |
| 40425 | 2017 | Obesity | New York | 25.5 |
| 40426 | 2017 | Obesity | North Carolina | 31.8 |
| 40427 | 2017 | Obesity | North Dakota | 31.9 |
| 40428 | 2017 | Obesity | Ohio | 31.5 |
| 40429 | 2017 | Obesity | Oklahoma | 32.8 |
| 40430 | 2017 | Obesity | Oregon | 28.7 |
| 40431 | 2017 | Obesity | Pennsylvania | 30.3 |
| 40432 | 2017 | Obesity | Rhode Island | 26.6 |
| 40433 | 2017 | Obesity | South Carolina | 32.3 |
| 40434 | 2017 | Obesity | South Dakota | 29.6 |
| 40435 | 2017 | Obesity | Tennessee | 34.8 |
| 40436 | 2017 | Obesity | Texas | 33.6 |
| 40437 | 2017 | Obesity | Utah | 25.3 |
| 40438 | 2017 | Obesity | Vermont | 27.1 |
| 40439 | 2017 | Obesity | Virginia | 29.0 |
| 40440 | 2017 | Obesity | Washington | 28.6 |
| 40441 | 2017 | Obesity | West Virginia | 37.7 |
| 40442 | 2017 | Obesity | Wisconsin | 30.7 |
| 40443 | 2017 | Obesity | Wyoming | 27.7 |
obesity_2018 = obesity_2018_raw[["Edition","Measure Name","State Name", "Value"]]
obesity_2018 = obesity_2018.rename(columns={"Edition": "year", "Measure Name": "data_type", "State Name": "state", "Value": "value"})
obesity_2018
| year | data_type | state | value | |
|---|---|---|---|---|
| 0 | 2018 | Adverse Childhood Experiences | Alaska | 23.8 |
| 1 | 2018 | Adverse Childhood Experiences | Alabama | 27.7 |
| 2 | 2018 | Adverse Childhood Experiences | United States | 21.7 |
| 3 | 2018 | Adverse Childhood Experiences | Arkansas | 29.6 |
| 4 | 2018 | Adverse Childhood Experiences | Arizona | 30.6 |
| ... | ... | ... | ... | ... |
| 60401 | 2018 | Water Fluoridation | Vermont | 56.3 |
| 60402 | 2018 | Water Fluoridation | Washington | 63.9 |
| 60403 | 2018 | Water Fluoridation | Wisconsin | 88.9 |
| 60404 | 2018 | Water Fluoridation | West Virginia | 90.5 |
| 60405 | 2018 | Water Fluoridation | Wyoming | 57.1 |
60406 rows × 4 columns
air_pollution_2018 = obesity_2018.copy()
air_pollution_2018 = air_pollution_2018[air_pollution_2018["data_type"] == "Air Pollution"]
air_pollution_2018.drop(air_pollution_2018[(air_pollution_2018.state == "United States") | (air_pollution_2018.state == "District of Columbia")].index, inplace=True)
air_pollution_2018
| year | data_type | state | value | |
|---|---|---|---|---|
| 104 | 2018 | Air Pollution | Alaska | 7.4 |
| 105 | 2018 | Air Pollution | Alabama | 8.4 |
| 106 | 2018 | Air Pollution | Arkansas | 7.1 |
| 107 | 2018 | Air Pollution | Arizona | 9.7 |
| 108 | 2018 | Air Pollution | California | 11.9 |
| 109 | 2018 | Air Pollution | Colorado | 6.7 |
| 110 | 2018 | Air Pollution | Connecticut | 7.7 |
| 111 | 2018 | Air Pollution | Delaware | 8.6 |
| 112 | 2018 | Air Pollution | Florida | 7.1 |
| 113 | 2018 | Air Pollution | Georgia | 8.6 |
| 114 | 2018 | Air Pollution | Hawaii | 5.8 |
| 115 | 2018 | Air Pollution | Iowa | 7.2 |
| 116 | 2018 | Air Pollution | Idaho | 6.7 |
| 117 | 2018 | Air Pollution | Illinois | 9.6 |
| 118 | 2018 | Air Pollution | Indiana | 8.7 |
| 119 | 2018 | Air Pollution | Kansas | 6.9 |
| 120 | 2018 | Air Pollution | Kentucky | 8.2 |
| 121 | 2018 | Air Pollution | Louisiana | 8.0 |
| 122 | 2018 | Air Pollution | Massachusetts | 6.0 |
| 123 | 2018 | Air Pollution | Maryland | 8.3 |
| 124 | 2018 | Air Pollution | Maine | 6.5 |
| 125 | 2018 | Air Pollution | Michigan | 8.3 |
| 126 | 2018 | Air Pollution | Minnesota | 7.1 |
| 127 | 2018 | Air Pollution | Missouri | 7.9 |
| 128 | 2018 | Air Pollution | Mississippi | 7.6 |
| 129 | 2018 | Air Pollution | Montana | 6.8 |
| 130 | 2018 | Air Pollution | North Carolina | 7.4 |
| 131 | 2018 | Air Pollution | North Dakota | 4.5 |
| 132 | 2018 | Air Pollution | Nebraska | 7.1 |
| 133 | 2018 | Air Pollution | New Hampshire | 5.0 |
| 134 | 2018 | Air Pollution | New Jersey | 8.3 |
| 135 | 2018 | Air Pollution | New Mexico | 5.8 |
| 136 | 2018 | Air Pollution | Nevada | 8.8 |
| 137 | 2018 | Air Pollution | New York | 7.0 |
| 138 | 2018 | Air Pollution | Ohio | 9.0 |
| 139 | 2018 | Air Pollution | Oklahoma | 7.9 |
| 140 | 2018 | Air Pollution | Oregon | 7.7 |
| 141 | 2018 | Air Pollution | Pennsylvania | 9.7 |
| 142 | 2018 | Air Pollution | Rhode Island | 7.6 |
| 143 | 2018 | Air Pollution | South Carolina | 7.4 |
| 144 | 2018 | Air Pollution | South Dakota | 5.4 |
| 145 | 2018 | Air Pollution | Tennessee | 7.7 |
| 146 | 2018 | Air Pollution | Texas | 8.6 |
| 147 | 2018 | Air Pollution | Utah | 8.3 |
| 148 | 2018 | Air Pollution | Virginia | 7.2 |
| 149 | 2018 | Air Pollution | Vermont | 5.2 |
| 150 | 2018 | Air Pollution | Washington | 8.0 |
| 151 | 2018 | Air Pollution | Wisconsin | 6.8 |
| 152 | 2018 | Air Pollution | West Virginia | 7.8 |
| 153 | 2018 | Air Pollution | Wyoming | 5.0 |
high_school_2018 = obesity_2018.copy()
high_school_2018 = high_school_2018[high_school_2018["data_type"] == "High School Graduation"]
high_school_2018.drop(high_school_2018[(high_school_2018.state == "United States") | (high_school_2018.state == "District of Columbia")].index, inplace=True)
high_school_2018
| year | data_type | state | value | |
|---|---|---|---|---|
| 38210 | 2018 | High School Graduation | Alabama | 87.1 |
| 38211 | 2018 | High School Graduation | Alaska | 76.1 |
| 38212 | 2018 | High School Graduation | Arizona | 79.5 |
| 38213 | 2018 | High School Graduation | Arkansas | 87.0 |
| 38214 | 2018 | High School Graduation | California | 83.0 |
| 38215 | 2018 | High School Graduation | Colorado | 78.9 |
| 38216 | 2018 | High School Graduation | Connecticut | 87.4 |
| 38217 | 2018 | High School Graduation | Delaware | 85.5 |
| 38218 | 2018 | High School Graduation | Florida | 80.7 |
| 38219 | 2018 | High School Graduation | Georgia | 79.4 |
| 38220 | 2018 | High School Graduation | Hawaii | 82.7 |
| 38221 | 2018 | High School Graduation | Idaho | 79.7 |
| 38222 | 2018 | High School Graduation | Illinois | 85.5 |
| 38223 | 2018 | High School Graduation | Indiana | 86.8 |
| 38224 | 2018 | High School Graduation | Iowa | 91.3 |
| 38225 | 2018 | High School Graduation | Kansas | 85.7 |
| 38226 | 2018 | High School Graduation | Kentucky | 88.6 |
| 38227 | 2018 | High School Graduation | Louisiana | 78.6 |
| 38228 | 2018 | High School Graduation | Maine | 87.0 |
| 38229 | 2018 | High School Graduation | Maryland | 87.6 |
| 38230 | 2018 | High School Graduation | Massachusetts | 87.5 |
| 38231 | 2018 | High School Graduation | Michigan | 79.7 |
| 38232 | 2018 | High School Graduation | Minnesota | 82.2 |
| 38233 | 2018 | High School Graduation | Mississippi | 82.3 |
| 38234 | 2018 | High School Graduation | Missouri | 89.0 |
| 38235 | 2018 | High School Graduation | Montana | 85.6 |
| 38236 | 2018 | High School Graduation | Nebraska | 89.3 |
| 38237 | 2018 | High School Graduation | Nevada | 73.6 |
| 38238 | 2018 | High School Graduation | New Hampshire | 88.2 |
| 38239 | 2018 | High School Graduation | New Jersey | 90.1 |
| 38240 | 2018 | High School Graduation | New Mexico | 71.0 |
| 38241 | 2018 | High School Graduation | New York | 80.4 |
| 38242 | 2018 | High School Graduation | North Carolina | 85.9 |
| 38243 | 2018 | High School Graduation | North Dakota | 87.5 |
| 38244 | 2018 | High School Graduation | Ohio | 83.5 |
| 38245 | 2018 | High School Graduation | Oklahoma | 81.6 |
| 38246 | 2018 | High School Graduation | Oregon | 74.8 |
| 38247 | 2018 | High School Graduation | Pennsylvania | 86.1 |
| 38248 | 2018 | High School Graduation | Rhode Island | 82.8 |
| 38249 | 2018 | High School Graduation | South Carolina | 82.6 |
| 38250 | 2018 | High School Graduation | South Dakota | 83.9 |
| 38251 | 2018 | High School Graduation | Tennessee | 88.5 |
| 38252 | 2018 | High School Graduation | Texas | 89.1 |
| 38253 | 2018 | High School Graduation | Utah | 85.2 |
| 38254 | 2018 | High School Graduation | Vermont | 87.7 |
| 38255 | 2018 | High School Graduation | Virginia | 86.7 |
| 38256 | 2018 | High School Graduation | Washington | 79.7 |
| 38257 | 2018 | High School Graduation | West Virginia | 89.8 |
| 38258 | 2018 | High School Graduation | Wisconsin | 88.2 |
| 38259 | 2018 | High School Graduation | Wyoming | 80.0 |
phys_2018 = obesity_2018.copy()
phys_2018 = phys_2018[phys_2018["data_type"] == "Physical Inactivity"]
phys_2018.drop(phys_2018[(phys_2018.state == "United States") | (phys_2018.state == "District of Columbia")].index, inplace=True)
phys_2018
| year | data_type | state | value | |
|---|---|---|---|---|
| 46628 | 2018 | Physical Inactivity | Alabama | 32.0 |
| 46629 | 2018 | Physical Inactivity | Alaska | 20.6 |
| 46630 | 2018 | Physical Inactivity | Arizona | 25.1 |
| 46631 | 2018 | Physical Inactivity | Arkansas | 32.5 |
| 46632 | 2018 | Physical Inactivity | California | 20.0 |
| 46633 | 2018 | Physical Inactivity | Colorado | 19.5 |
| 46634 | 2018 | Physical Inactivity | Connecticut | 24.0 |
| 46635 | 2018 | Physical Inactivity | Delaware | 31.0 |
| 46636 | 2018 | Physical Inactivity | Florida | 29.2 |
| 46637 | 2018 | Physical Inactivity | Georgia | 31.0 |
| 46638 | 2018 | Physical Inactivity | Hawaii | 23.5 |
| 46639 | 2018 | Physical Inactivity | Idaho | 24.2 |
| 46640 | 2018 | Physical Inactivity | Illinois | 24.0 |
| 46641 | 2018 | Physical Inactivity | Indiana | 29.8 |
| 46642 | 2018 | Physical Inactivity | Iowa | 25.0 |
| 46643 | 2018 | Physical Inactivity | Kansas | 27.9 |
| 46644 | 2018 | Physical Inactivity | Kentucky | 34.4 |
| 46645 | 2018 | Physical Inactivity | Louisiana | 31.8 |
| 46646 | 2018 | Physical Inactivity | Maine | 25.2 |
| 46647 | 2018 | Physical Inactivity | Maryland | 25.6 |
| 46648 | 2018 | Physical Inactivity | Massachusetts | 24.8 |
| 46649 | 2018 | Physical Inactivity | Michigan | 27.2 |
| 46650 | 2018 | Physical Inactivity | Minnesota | 24.6 |
| 46651 | 2018 | Physical Inactivity | Mississippi | 33.2 |
| 46652 | 2018 | Physical Inactivity | Missouri | 29.2 |
| 46653 | 2018 | Physical Inactivity | Montana | 25.0 |
| 46654 | 2018 | Physical Inactivity | Nebraska | 25.4 |
| 46655 | 2018 | Physical Inactivity | Nevada | 28.0 |
| 46656 | 2018 | Physical Inactivity | New Hampshire | 23.9 |
| 46657 | 2018 | Physical Inactivity | New Jersey | 29.0 |
| 46658 | 2018 | Physical Inactivity | New Mexico | 24.5 |
| 46659 | 2018 | Physical Inactivity | New York | 27.2 |
| 46660 | 2018 | Physical Inactivity | North Carolina | 25.6 |
| 46661 | 2018 | Physical Inactivity | North Dakota | 27.6 |
| 46662 | 2018 | Physical Inactivity | Ohio | 29.6 |
| 46663 | 2018 | Physical Inactivity | Oklahoma | 32.4 |
| 46664 | 2018 | Physical Inactivity | Oregon | 21.4 |
| 46665 | 2018 | Physical Inactivity | Pennsylvania | 24.9 |
| 46666 | 2018 | Physical Inactivity | Rhode Island | 26.3 |
| 46667 | 2018 | Physical Inactivity | South Carolina | 28.4 |
| 46668 | 2018 | Physical Inactivity | South Dakota | 24.9 |
| 46669 | 2018 | Physical Inactivity | Tennessee | 30.6 |
| 46670 | 2018 | Physical Inactivity | Texas | 32.1 |
| 46671 | 2018 | Physical Inactivity | Utah | 21.1 |
| 46672 | 2018 | Physical Inactivity | Vermont | 21.6 |
| 46673 | 2018 | Physical Inactivity | Virginia | 25.9 |
| 46674 | 2018 | Physical Inactivity | Washington | 19.2 |
| 46675 | 2018 | Physical Inactivity | West Virginia | 31.6 |
| 46676 | 2018 | Physical Inactivity | Wisconsin | 22.4 |
| 46677 | 2018 | Physical Inactivity | Wyoming | 25.7 |
health_fund_2018 = obesity_2018.copy()
health_fund_2018 = health_fund_2018[health_fund_2018["data_type"] == "Public Health Funding"]
health_fund_2018.drop(health_fund_2018[(health_fund_2018.state == "United States") | (health_fund_2018.state == "District of Columbia")].index, inplace=True)
health_fund_2018
| year | data_type | state | value | |
|---|---|---|---|---|
| 51670 | 2018 | Public Health Funding | Alabama | 112.0 |
| 51671 | 2018 | Public Health Funding | Alaska | 279.0 |
| 51672 | 2018 | Public Health Funding | Arizona | 51.0 |
| 51673 | 2018 | Public Health Funding | Arkansas | 108.0 |
| 51674 | 2018 | Public Health Funding | California | 109.0 |
| 51675 | 2018 | Public Health Funding | Colorado | 98.0 |
| 51676 | 2018 | Public Health Funding | Connecticut | 81.0 |
| 51677 | 2018 | Public Health Funding | Delaware | 107.0 |
| 51678 | 2018 | Public Health Funding | Florida | 62.0 |
| 51679 | 2018 | Public Health Funding | Georgia | 73.0 |
| 51680 | 2018 | Public Health Funding | Hawaii | 224.0 |
| 51681 | 2018 | Public Health Funding | Idaho | 149.0 |
| 51682 | 2018 | Public Health Funding | Illinois | 71.0 |
| 51683 | 2018 | Public Health Funding | Indiana | 51.0 |
| 51684 | 2018 | Public Health Funding | Iowa | 109.0 |
| 51685 | 2018 | Public Health Funding | Kansas | 56.0 |
| 51686 | 2018 | Public Health Funding | Kentucky | 83.0 |
| 51687 | 2018 | Public Health Funding | Louisiana | 85.0 |
| 51688 | 2018 | Public Health Funding | Maine | 97.0 |
| 51689 | 2018 | Public Health Funding | Maryland | 96.0 |
| 51690 | 2018 | Public Health Funding | Massachusetts | 124.0 |
| 51691 | 2018 | Public Health Funding | Michigan | 58.0 |
| 51692 | 2018 | Public Health Funding | Minnesota | 89.0 |
| 51693 | 2018 | Public Health Funding | Mississippi | 82.0 |
| 51694 | 2018 | Public Health Funding | Missouri | 56.0 |
| 51695 | 2018 | Public Health Funding | Montana | 115.0 |
| 51696 | 2018 | Public Health Funding | Nebraska | 97.0 |
| 51697 | 2018 | Public Health Funding | Nevada | 43.0 |
| 51698 | 2018 | Public Health Funding | New Hampshire | 74.0 |
| 51699 | 2018 | Public Health Funding | New Jersey | 65.0 |
| 51700 | 2018 | Public Health Funding | New Mexico | 173.0 |
| 51701 | 2018 | Public Health Funding | New York | 149.0 |
| 51702 | 2018 | Public Health Funding | North Carolina | 57.0 |
| 51703 | 2018 | Public Health Funding | North Dakota | 130.0 |
| 51704 | 2018 | Public Health Funding | Ohio | 53.0 |
| 51705 | 2018 | Public Health Funding | Oklahoma | 87.0 |
| 51706 | 2018 | Public Health Funding | Oregon | 81.0 |
| 51707 | 2018 | Public Health Funding | Pennsylvania | 57.0 |
| 51708 | 2018 | Public Health Funding | Rhode Island | 140.0 |
| 51709 | 2018 | Public Health Funding | South Carolina | 76.0 |
| 51710 | 2018 | Public Health Funding | South Dakota | 111.0 |
| 51711 | 2018 | Public Health Funding | Tennessee | 97.0 |
| 51712 | 2018 | Public Health Funding | Texas | 64.0 |
| 51713 | 2018 | Public Health Funding | Utah | 78.0 |
| 51714 | 2018 | Public Health Funding | Vermont | 144.0 |
| 51715 | 2018 | Public Health Funding | Virginia | 73.0 |
| 51716 | 2018 | Public Health Funding | Washington | 94.0 |
| 51717 | 2018 | Public Health Funding | West Virginia | 215.0 |
| 51718 | 2018 | Public Health Funding | Wisconsin | 53.0 |
| 51719 | 2018 | Public Health Funding | Wyoming | 109.0 |
obesity_2018 = obesity_2018[obesity_2018["data_type"] == "Obesity"]
obesity_2018.drop(obesity_2018[(obesity_2018.state == "United States") | (obesity_2018.state == "District of Columbia")].index, inplace=True)
obesity_2018
/opt/conda/lib/python3.8/site-packages/pandas/core/frame.py:4167: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy return super().drop(
| year | data_type | state | value | |
|---|---|---|---|---|
| 44966 | 2018 | Obesity | Alabama | 36.3 |
| 44967 | 2018 | Obesity | Alaska | 34.2 |
| 44968 | 2018 | Obesity | Arizona | 29.5 |
| 44969 | 2018 | Obesity | Arkansas | 35.0 |
| 44970 | 2018 | Obesity | California | 25.1 |
| 44971 | 2018 | Obesity | Colorado | 22.6 |
| 44972 | 2018 | Obesity | Connecticut | 26.9 |
| 44973 | 2018 | Obesity | Delaware | 31.8 |
| 44974 | 2018 | Obesity | Florida | 28.4 |
| 44975 | 2018 | Obesity | Georgia | 31.6 |
| 44976 | 2018 | Obesity | Hawaii | 23.8 |
| 44977 | 2018 | Obesity | Idaho | 29.3 |
| 44978 | 2018 | Obesity | Illinois | 31.1 |
| 44979 | 2018 | Obesity | Indiana | 33.6 |
| 44980 | 2018 | Obesity | Iowa | 36.4 |
| 44981 | 2018 | Obesity | Kansas | 32.3 |
| 44982 | 2018 | Obesity | Kentucky | 34.3 |
| 44983 | 2018 | Obesity | Louisiana | 36.2 |
| 44984 | 2018 | Obesity | Maine | 29.1 |
| 44985 | 2018 | Obesity | Maryland | 31.3 |
| 44986 | 2018 | Obesity | Massachusetts | 25.8 |
| 44987 | 2018 | Obesity | Michigan | 32.3 |
| 44988 | 2018 | Obesity | Minnesota | 28.4 |
| 44989 | 2018 | Obesity | Mississippi | 37.3 |
| 44990 | 2018 | Obesity | Missouri | 32.5 |
| 44991 | 2018 | Obesity | Montana | 25.3 |
| 44992 | 2018 | Obesity | Nebraska | 32.8 |
| 44993 | 2018 | Obesity | Nevada | 26.7 |
| 44994 | 2018 | Obesity | New Hampshire | 28.1 |
| 44995 | 2018 | Obesity | New Jersey | 27.2 |
| 44996 | 2018 | Obesity | New Mexico | 28.4 |
| 44997 | 2018 | Obesity | New York | 25.7 |
| 44998 | 2018 | Obesity | North Carolina | 32.1 |
| 44999 | 2018 | Obesity | North Dakota | 33.1 |
| 45000 | 2018 | Obesity | Ohio | 33.8 |
| 45001 | 2018 | Obesity | Oklahoma | 36.5 |
| 45002 | 2018 | Obesity | Oregon | 29.4 |
| 45003 | 2018 | Obesity | Pennsylvania | 31.6 |
| 45004 | 2018 | Obesity | Rhode Island | 30.0 |
| 45005 | 2018 | Obesity | South Carolina | 34.1 |
| 45006 | 2018 | Obesity | South Dakota | 31.9 |
| 45007 | 2018 | Obesity | Tennessee | 32.8 |
| 45008 | 2018 | Obesity | Texas | 33.0 |
| 45009 | 2018 | Obesity | Utah | 25.2 |
| 45010 | 2018 | Obesity | Vermont | 27.6 |
| 45011 | 2018 | Obesity | Virginia | 30.0 |
| 45012 | 2018 | Obesity | Washington | 27.7 |
| 45013 | 2018 | Obesity | West Virginia | 38.1 |
| 45014 | 2018 | Obesity | Wisconsin | 32.0 |
| 45015 | 2018 | Obesity | Wyoming | 28.8 |
obesity_2019 = obesity_2019_raw[["Edition","Measure Name","State Name", "Value"]]
obesity_2019 = obesity_2019.rename(columns={"Edition": "year", "Measure Name": "data_type", "State Name": "state", "Value": "value"})
obesity_2019
| year | data_type | state | value | |
|---|---|---|---|---|
| 0 | 2019 | Adverse Childhood Experiences | Alaska | 24.1 |
| 1 | 2019 | Adverse Childhood Experiences | Alabama | 26.3 |
| 2 | 2019 | Adverse Childhood Experiences | United States | 20.5 |
| 3 | 2019 | Adverse Childhood Experiences | Arkansas | 27.1 |
| 4 | 2019 | Adverse Childhood Experiences | Arizona | 27.3 |
| ... | ... | ... | ... | ... |
| 61504 | 2019 | Water Fluoridation | Vermont | 56.3 |
| 61505 | 2019 | Water Fluoridation | Washington | 63.9 |
| 61506 | 2019 | Water Fluoridation | Wisconsin | 88.9 |
| 61507 | 2019 | Water Fluoridation | West Virginia | 90.5 |
| 61508 | 2019 | Water Fluoridation | Wyoming | 57.1 |
61509 rows × 4 columns
air_pollution_2019 = obesity_2019.copy()
air_pollution_2019 = air_pollution_2019[air_pollution_2019["data_type"] == "Air Pollution"]
air_pollution_2019.drop(air_pollution_2019[(air_pollution_2019.state == "United States") | (air_pollution_2019.state == "District of Columbia")].index, inplace=True)
air_pollution_2019
| year | data_type | state | value | |
|---|---|---|---|---|
| 52 | 2019 | Air Pollution | Alaska | 6.4 |
| 53 | 2019 | Air Pollution | Alabama | 8.1 |
| 54 | 2019 | Air Pollution | Arkansas | 7.1 |
| 55 | 2019 | Air Pollution | Arizona | 9.7 |
| 56 | 2019 | Air Pollution | California | 12.8 |
| 57 | 2019 | Air Pollution | Colorado | 6.7 |
| 58 | 2019 | Air Pollution | Connecticut | 7.2 |
| 59 | 2019 | Air Pollution | Delaware | 8.3 |
| 60 | 2019 | Air Pollution | Florida | 7.4 |
| 61 | 2019 | Air Pollution | Georgia | 8.3 |
| 62 | 2019 | Air Pollution | Hawaii | 5.4 |
| 63 | 2019 | Air Pollution | Iowa | 7.1 |
| 64 | 2019 | Air Pollution | Idaho | 6.8 |
| 65 | 2019 | Air Pollution | Illinois | 9.3 |
| 66 | 2019 | Air Pollution | Indiana | 8.4 |
| 67 | 2019 | Air Pollution | Kansas | 7.0 |
| 68 | 2019 | Air Pollution | Kentucky | 8.1 |
| 69 | 2019 | Air Pollution | Louisiana | 7.9 |
| 70 | 2019 | Air Pollution | Massachusetts | 6.3 |
| 71 | 2019 | Air Pollution | Maryland | 7.7 |
| 72 | 2019 | Air Pollution | Maine | 5.9 |
| 73 | 2019 | Air Pollution | Michigan | 8.0 |
| 74 | 2019 | Air Pollution | Minnesota | 6.6 |
| 75 | 2019 | Air Pollution | Missouri | 7.5 |
| 76 | 2019 | Air Pollution | Mississippi | 7.7 |
| 77 | 2019 | Air Pollution | Montana | 6.6 |
| 78 | 2019 | Air Pollution | North Carolina | 7.2 |
| 79 | 2019 | Air Pollution | North Dakota | 4.6 |
| 80 | 2019 | Air Pollution | Nebraska | 7.1 |
| 81 | 2019 | Air Pollution | New Hampshire | 4.4 |
| 82 | 2019 | Air Pollution | New Jersey | 8.1 |
| 83 | 2019 | Air Pollution | New Mexico | 6.0 |
| 84 | 2019 | Air Pollution | Nevada | 9.0 |
| 85 | 2019 | Air Pollution | New York | 6.6 |
| 86 | 2019 | Air Pollution | Ohio | 8.5 |
| 87 | 2019 | Air Pollution | Oklahoma | 8.2 |
| 88 | 2019 | Air Pollution | Oregon | 7.8 |
| 89 | 2019 | Air Pollution | Pennsylvania | 9.2 |
| 90 | 2019 | Air Pollution | Rhode Island | 7.3 |
| 91 | 2019 | Air Pollution | South Carolina | 7.4 |
| 92 | 2019 | Air Pollution | South Dakota | 5.1 |
| 93 | 2019 | Air Pollution | Tennessee | 7.4 |
| 94 | 2019 | Air Pollution | Texas | 8.3 |
| 95 | 2019 | Air Pollution | Utah | 8.4 |
| 96 | 2019 | Air Pollution | Virginia | 6.9 |
| 97 | 2019 | Air Pollution | Vermont | 5.1 |
| 98 | 2019 | Air Pollution | Washington | 8.0 |
| 99 | 2019 | Air Pollution | Wisconsin | 6.8 |
| 100 | 2019 | Air Pollution | West Virginia | 7.6 |
| 101 | 2019 | Air Pollution | Wyoming | 5.0 |
high_school_2019 = obesity_2019.copy()
high_school_2019 = high_school_2019[high_school_2019["data_type"] == "High School Graduation"]
high_school_2019.drop(high_school_2019[(high_school_2019.state == "United States") | (high_school_2019.state == "District of Columbia")].index, inplace=True)
high_school_2019
| year | data_type | state | value | |
|---|---|---|---|---|
| 33587 | 2019 | High School Graduation | Alabama | 89.3 |
| 33588 | 2019 | High School Graduation | Alaska | 78.2 |
| 33589 | 2019 | High School Graduation | Arizona | 78.0 |
| 33590 | 2019 | High School Graduation | Arkansas | 88.0 |
| 33591 | 2019 | High School Graduation | California | 82.7 |
| 33592 | 2019 | High School Graduation | Colorado | 79.1 |
| 33593 | 2019 | High School Graduation | Connecticut | 87.9 |
| 33594 | 2019 | High School Graduation | Delaware | 86.9 |
| 33595 | 2019 | High School Graduation | Florida | 82.3 |
| 33596 | 2019 | High School Graduation | Georgia | 80.6 |
| 33597 | 2019 | High School Graduation | Hawaii | 82.7 |
| 33598 | 2019 | High School Graduation | Idaho | 79.7 |
| 33599 | 2019 | High School Graduation | Illinois | 87.0 |
| 33600 | 2019 | High School Graduation | Indiana | 83.8 |
| 33601 | 2019 | High School Graduation | Iowa | 91.0 |
| 33602 | 2019 | High School Graduation | Kansas | 86.5 |
| 33603 | 2019 | High School Graduation | Kentucky | 89.7 |
| 33604 | 2019 | High School Graduation | Louisiana | 78.1 |
| 33605 | 2019 | High School Graduation | Maine | 86.9 |
| 33606 | 2019 | High School Graduation | Maryland | 87.7 |
| 33607 | 2019 | High School Graduation | Massachusetts | 88.3 |
| 33608 | 2019 | High School Graduation | Michigan | 80.2 |
| 33609 | 2019 | High School Graduation | Minnesota | 82.7 |
| 33610 | 2019 | High School Graduation | Mississippi | 83.0 |
| 33611 | 2019 | High School Graduation | Missouri | 88.3 |
| 33612 | 2019 | High School Graduation | Montana | 85.8 |
| 33613 | 2019 | High School Graduation | Nebraska | 89.1 |
| 33614 | 2019 | High School Graduation | Nevada | 80.9 |
| 33615 | 2019 | High School Graduation | New Hampshire | 88.9 |
| 33616 | 2019 | High School Graduation | New Jersey | 90.5 |
| 33617 | 2019 | High School Graduation | New Mexico | 71.1 |
| 33618 | 2019 | High School Graduation | New York | 81.8 |
| 33619 | 2019 | High School Graduation | North Carolina | 86.6 |
| 33620 | 2019 | High School Graduation | North Dakota | 87.2 |
| 33621 | 2019 | High School Graduation | Ohio | 84.2 |
| 33622 | 2019 | High School Graduation | Oklahoma | 82.6 |
| 33623 | 2019 | High School Graduation | Oregon | 76.7 |
| 33624 | 2019 | High School Graduation | Pennsylvania | 86.6 |
| 33625 | 2019 | High School Graduation | Rhode Island | 84.1 |
| 33626 | 2019 | High School Graduation | South Carolina | 83.6 |
| 33627 | 2019 | High School Graduation | South Dakota | 83.7 |
| 33628 | 2019 | High School Graduation | Tennessee | 89.8 |
| 33629 | 2019 | High School Graduation | Texas | 89.7 |
| 33630 | 2019 | High School Graduation | Utah | 86.0 |
| 33631 | 2019 | High School Graduation | Vermont | 89.1 |
| 33632 | 2019 | High School Graduation | Virginia | 86.9 |
| 33633 | 2019 | High School Graduation | Washington | 79.4 |
| 33634 | 2019 | High School Graduation | West Virginia | 89.4 |
| 33635 | 2019 | High School Graduation | Wisconsin | 88.6 |
| 33636 | 2019 | High School Graduation | Wyoming | 86.2 |
phys_2019 = obesity_2019.copy()
phys_2019 = phys_2019[phys_2019["data_type"] == "Physical Inactivity"]
phys_2019.drop(phys_2019[(phys_2019.state == "United States") | (phys_2019.state == "District of Columbia")].index, inplace=True)
phys_2019
| year | data_type | state | value | |
|---|---|---|---|---|
| 47054 | 2019 | Physical Inactivity | Alabama | 30.7 |
| 47055 | 2019 | Physical Inactivity | Alaska | 19.6 |
| 47056 | 2019 | Physical Inactivity | Arizona | 22.1 |
| 47057 | 2019 | Physical Inactivity | Arkansas | 31.0 |
| 47058 | 2019 | Physical Inactivity | California | 21.0 |
| 47059 | 2019 | Physical Inactivity | Colorado | 16.4 |
| 47060 | 2019 | Physical Inactivity | Connecticut | 22.4 |
| 47061 | 2019 | Physical Inactivity | Delaware | 26.8 |
| 47062 | 2019 | Physical Inactivity | Florida | 26.8 |
| 47063 | 2019 | Physical Inactivity | Georgia | 26.2 |
| 47064 | 2019 | Physical Inactivity | Hawaii | 19.9 |
| 47065 | 2019 | Physical Inactivity | Idaho | 20.5 |
| 47066 | 2019 | Physical Inactivity | Illinois | 24.6 |
| 47067 | 2019 | Physical Inactivity | Indiana | 27.4 |
| 47068 | 2019 | Physical Inactivity | Iowa | 22.9 |
| 47069 | 2019 | Physical Inactivity | Kansas | 22.5 |
| 47070 | 2019 | Physical Inactivity | Kentucky | 32.4 |
| 47071 | 2019 | Physical Inactivity | Louisiana | 30.8 |
| 47072 | 2019 | Physical Inactivity | Maine | 22.5 |
| 47073 | 2019 | Physical Inactivity | Maryland | 22.9 |
| 47074 | 2019 | Physical Inactivity | Massachusetts | 22.4 |
| 47075 | 2019 | Physical Inactivity | Michigan | 23.8 |
| 47076 | 2019 | Physical Inactivity | Minnesota | 20.5 |
| 47077 | 2019 | Physical Inactivity | Mississippi | 32.0 |
| 47078 | 2019 | Physical Inactivity | Missouri | 26.1 |
| 47079 | 2019 | Physical Inactivity | Montana | 22.7 |
| 47080 | 2019 | Physical Inactivity | Nebraska | 23.8 |
| 47081 | 2019 | Physical Inactivity | Nevada | 25.0 |
| 47082 | 2019 | Physical Inactivity | New Hampshire | 21.5 |
| 47083 | 2019 | Physical Inactivity | New Jersey | 28.5 |
| 47084 | 2019 | Physical Inactivity | New Mexico | 22.2 |
| 47085 | 2019 | Physical Inactivity | New York | 23.8 |
| 47086 | 2019 | Physical Inactivity | North Carolina | 23.9 |
| 47087 | 2019 | Physical Inactivity | North Dakota | 22.3 |
| 47088 | 2019 | Physical Inactivity | Ohio | 25.4 |
| 47089 | 2019 | Physical Inactivity | Oklahoma | 27.2 |
| 47090 | 2019 | Physical Inactivity | Oregon | 19.3 |
| 47091 | 2019 | Physical Inactivity | Pennsylvania | 24.0 |
| 47092 | 2019 | Physical Inactivity | Rhode Island | 25.3 |
| 47093 | 2019 | Physical Inactivity | South Carolina | 26.7 |
| 47094 | 2019 | Physical Inactivity | South Dakota | 24.0 |
| 47095 | 2019 | Physical Inactivity | Tennessee | 30.9 |
| 47096 | 2019 | Physical Inactivity | Texas | 25.6 |
| 47097 | 2019 | Physical Inactivity | Utah | 17.5 |
| 47098 | 2019 | Physical Inactivity | Vermont | 18.9 |
| 47099 | 2019 | Physical Inactivity | Virginia | 22.0 |
| 47100 | 2019 | Physical Inactivity | Washington | 17.6 |
| 47101 | 2019 | Physical Inactivity | West Virginia | 28.2 |
| 47102 | 2019 | Physical Inactivity | Wisconsin | 21.8 |
| 47103 | 2019 | Physical Inactivity | Wyoming | 21.7 |
health_fund_2019 = obesity_2019.copy()
health_fund_2019 = health_fund_2019[health_fund_2019["data_type"] == "Public Health Funding"]
health_fund_2019.drop(health_fund_2019[(health_fund_2019.state == "United States") | (health_fund_2019.state == "District of Columbia")].index, inplace=True)
health_fund_2019
| year | data_type | state | value | |
|---|---|---|---|---|
| 50433 | 2019 | Public Health Funding | Alabama | 115.0 |
| 50434 | 2019 | Public Health Funding | Alaska | 281.0 |
| 50435 | 2019 | Public Health Funding | Arizona | 53.0 |
| 50436 | 2019 | Public Health Funding | Arkansas | 108.0 |
| 50437 | 2019 | Public Health Funding | California | 114.0 |
| 50438 | 2019 | Public Health Funding | Colorado | 101.0 |
| 50439 | 2019 | Public Health Funding | Connecticut | 86.0 |
| 50440 | 2019 | Public Health Funding | Delaware | 111.0 |
| 50441 | 2019 | Public Health Funding | Florida | 64.0 |
| 50442 | 2019 | Public Health Funding | Georgia | 76.0 |
| 50443 | 2019 | Public Health Funding | Hawaii | 192.0 |
| 50444 | 2019 | Public Health Funding | Idaho | 150.0 |
| 50445 | 2019 | Public Health Funding | Illinois | 73.0 |
| 50446 | 2019 | Public Health Funding | Indiana | 53.0 |
| 50447 | 2019 | Public Health Funding | Iowa | 91.0 |
| 50448 | 2019 | Public Health Funding | Kansas | 60.0 |
| 50449 | 2019 | Public Health Funding | Kentucky | 87.0 |
| 50450 | 2019 | Public Health Funding | Louisiana | 89.0 |
| 50451 | 2019 | Public Health Funding | Maine | 99.0 |
| 50452 | 2019 | Public Health Funding | Maryland | 104.0 |
| 50453 | 2019 | Public Health Funding | Massachusetts | 137.0 |
| 50454 | 2019 | Public Health Funding | Michigan | 58.0 |
| 50455 | 2019 | Public Health Funding | Minnesota | 85.0 |
| 50456 | 2019 | Public Health Funding | Mississippi | 85.0 |
| 50457 | 2019 | Public Health Funding | Missouri | 57.0 |
| 50458 | 2019 | Public Health Funding | Montana | 120.0 |
| 50459 | 2019 | Public Health Funding | Nebraska | 98.0 |
| 50460 | 2019 | Public Health Funding | Nevada | 46.0 |
| 50461 | 2019 | Public Health Funding | New Hampshire | 82.0 |
| 50462 | 2019 | Public Health Funding | New Jersey | 66.0 |
| 50463 | 2019 | Public Health Funding | New Mexico | 220.0 |
| 50464 | 2019 | Public Health Funding | New York | 148.0 |
| 50465 | 2019 | Public Health Funding | North Carolina | 59.0 |
| 50466 | 2019 | Public Health Funding | North Dakota | 113.0 |
| 50467 | 2019 | Public Health Funding | Ohio | 53.0 |
| 50468 | 2019 | Public Health Funding | Oklahoma | 89.0 |
| 50469 | 2019 | Public Health Funding | Oregon | 81.0 |
| 50470 | 2019 | Public Health Funding | Pennsylvania | 57.0 |
| 50471 | 2019 | Public Health Funding | Rhode Island | 141.0 |
| 50472 | 2019 | Public Health Funding | South Carolina | 80.0 |
| 50473 | 2019 | Public Health Funding | South Dakota | 113.0 |
| 50474 | 2019 | Public Health Funding | Tennessee | 99.0 |
| 50475 | 2019 | Public Health Funding | Texas | 60.0 |
| 50476 | 2019 | Public Health Funding | Utah | 80.0 |
| 50477 | 2019 | Public Health Funding | Vermont | 144.0 |
| 50478 | 2019 | Public Health Funding | Virginia | 77.0 |
| 50479 | 2019 | Public Health Funding | Washington | 96.0 |
| 50480 | 2019 | Public Health Funding | West Virginia | 140.0 |
| 50481 | 2019 | Public Health Funding | Wisconsin | 55.0 |
| 50482 | 2019 | Public Health Funding | Wyoming | 112.0 |
obesity_2019 = obesity_2019[obesity_2019["data_type"] == "Obesity"]
obesity_2019.drop(obesity_2019[(obesity_2019.state == "United States") | (obesity_2019.state == "District of Columbia")].index, inplace=True)
obesity_2019
/opt/conda/lib/python3.8/site-packages/pandas/core/frame.py:4167: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy return super().drop(
| year | data_type | state | value | |
|---|---|---|---|---|
| 45391 | 2019 | Obesity | Alabama | 36.2 |
| 45392 | 2019 | Obesity | Alaska | 29.5 |
| 45393 | 2019 | Obesity | Arizona | 29.5 |
| 45394 | 2019 | Obesity | Arkansas | 37.1 |
| 45395 | 2019 | Obesity | California | 25.8 |
| 45396 | 2019 | Obesity | Colorado | 22.9 |
| 45397 | 2019 | Obesity | Connecticut | 27.4 |
| 45398 | 2019 | Obesity | Delaware | 33.5 |
| 45399 | 2019 | Obesity | Florida | 30.7 |
| 45400 | 2019 | Obesity | Georgia | 32.5 |
| 45401 | 2019 | Obesity | Hawaii | 24.9 |
| 45402 | 2019 | Obesity | Idaho | 28.4 |
| 45403 | 2019 | Obesity | Illinois | 31.8 |
| 45404 | 2019 | Obesity | Indiana | 34.1 |
| 45405 | 2019 | Obesity | Iowa | 35.3 |
| 45406 | 2019 | Obesity | Kansas | 34.4 |
| 45407 | 2019 | Obesity | Kentucky | 36.6 |
| 45408 | 2019 | Obesity | Louisiana | 36.8 |
| 45409 | 2019 | Obesity | Maine | 30.4 |
| 45410 | 2019 | Obesity | Maryland | 30.9 |
| 45411 | 2019 | Obesity | Massachusetts | 25.7 |
| 45412 | 2019 | Obesity | Michigan | 33.0 |
| 45413 | 2019 | Obesity | Minnesota | 30.1 |
| 45414 | 2019 | Obesity | Mississippi | 39.5 |
| 45415 | 2019 | Obesity | Missouri | 35.0 |
| 45416 | 2019 | Obesity | Montana | 26.9 |
| 45417 | 2019 | Obesity | Nebraska | 34.1 |
| 45418 | 2019 | Obesity | Nevada | 29.5 |
| 45419 | 2019 | Obesity | New Hampshire | 29.6 |
| 45420 | 2019 | Obesity | New Jersey | 25.6 |
| 45421 | 2019 | Obesity | New Mexico | 32.3 |
| 45422 | 2019 | Obesity | New York | 27.6 |
| 45423 | 2019 | Obesity | North Carolina | 33.0 |
| 45424 | 2019 | Obesity | North Dakota | 35.1 |
| 45425 | 2019 | Obesity | Ohio | 34.0 |
| 45426 | 2019 | Obesity | Oklahoma | 34.8 |
| 45427 | 2019 | Obesity | Oregon | 29.9 |
| 45428 | 2019 | Obesity | Pennsylvania | 30.9 |
| 45429 | 2019 | Obesity | Rhode Island | 27.7 |
| 45430 | 2019 | Obesity | South Carolina | 34.3 |
| 45431 | 2019 | Obesity | South Dakota | 30.1 |
| 45432 | 2019 | Obesity | Tennessee | 34.4 |
| 45433 | 2019 | Obesity | Texas | 34.8 |
| 45434 | 2019 | Obesity | Utah | 27.7 |
| 45435 | 2019 | Obesity | Vermont | 27.5 |
| 45436 | 2019 | Obesity | Virginia | 30.3 |
| 45437 | 2019 | Obesity | Washington | 28.7 |
| 45438 | 2019 | Obesity | West Virginia | 39.5 |
| 45439 | 2019 | Obesity | Wisconsin | 32.0 |
| 45440 | 2019 | Obesity | Wyoming | 29.0 |
After handing the 2015-2019 air pollution, high school graduration, public health funding and obesity dataset, we will be cleaning up the GDP datasets. We will be removing the GeoFips column. Also, we will be melting the dataframe, and make year as one column to keep the dataframe tidy. More about Tidy Data here: https://cmsc320.github.io/files/tidy_data.pdf
GDP_by_state = GDP_by_state_raw[["GeoName","2015", "2016", "2017", "2018", "2019"]]
GDP_by_state = pd.melt(GDP_by_state, id_vars =["GeoName"], value_vars =["2015", "2016", "2017", "2018", "2019"])
GDP_by_state = GDP_by_state.rename(columns={"GeoName": "state", "variable": "year", "value": "gdp"})
GDP_by_state['year'] = GDP_by_state['year'].astype(int)
GDP_by_state
| state | year | gdp | |
|---|---|---|---|
| 0 | Alabama | 2015 | 200197.5 |
| 1 | Alaska | 2015 | 50728.1 |
| 2 | Arizona | 2015 | 298615.0 |
| 3 | Arkansas | 2015 | 117734.3 |
| 4 | California | 2015 | 2559643.2 |
| ... | ... | ... | ... |
| 290 | Plains | 2019 | 1325462.4 |
| 291 | Southeast | 2019 | 4531095.2 |
| 292 | Southwest | 2019 | 2521101.2 |
| 293 | Rocky Mountain | 2019 | 762525.5 |
| 294 | Far West | 2019 | 4327749.6 |
295 rows × 3 columns
GDP_by_state_pct = GDP_by_state_pct_raw[["GeoName","2015", "2016", "2017", "2018", "2019"]]
GDP_by_state_pct = pd.melt(GDP_by_state_pct, id_vars =["GeoName"], value_vars =["2015", "2016", "2017", "2018", "2019"])
GDP_by_state_pct = GDP_by_state_pct.rename(columns={"GeoName": "state", "variable": "year", "value": "gdp_pct"})
GDP_by_state_pct['year'] = GDP_by_state_pct['year'].astype(int)
GDP_by_state_pct
| state | year | gdp_pct | |
|---|---|---|---|
| 0 | Alabama | 2015 | 1.1 |
| 1 | Alaska | 2015 | 0.3 |
| 2 | Arizona | 2015 | 1.6 |
| 3 | Arkansas | 2015 | 0.6 |
| 4 | California | 2015 | 14.0 |
| ... | ... | ... | ... |
| 290 | Plains | 2019 | 6.2 |
| 291 | Southeast | 2019 | 21.1 |
| 292 | Southwest | 2019 | 11.8 |
| 293 | Rocky Mountain | 2019 | 3.6 |
| 294 | Far West | 2019 | 20.2 |
295 rows × 3 columns
GDP_cap_2019 = pd.melt(GDP_cap_2019_raw, id_vars =["state"], value_vars =["2019"])
GDP_cap_2019
| state | variable | value | |
|---|---|---|---|
| 0 | Alabama | 2019 | 47735 |
| 1 | Alaska | 2019 | 76220 |
| 2 | American Samoa | 2019 | 11200 |
| 3 | Arizona | 2019 | 51179 |
| 4 | Arkansas | 2019 | 44808 |
| 5 | California | 2019 | 80563 |
| 6 | Colorado | 2019 | 68828 |
| 7 | Connecticut | 2019 | 81055 |
| 8 | Delaware | 2019 | 78468 |
| 9 | District of Columbia | 2019 | 200277 |
| 10 | Florida | 2019 | 51745 |
| 11 | Georgia | 2019 | 58896 |
| 12 | Guam | 2019 | 35600 |
| 13 | Hawaii | 2019 | 69593 |
| 14 | Idaho | 2019 | 46043 |
| 15 | Illinois | 2019 | 71727 |
| 16 | Indiana | 2019 | 56702 |
| 17 | Iowa | 2019 | 62493 |
| 18 | Kansas | 2019 | 60310 |
| 19 | Kentucky | 2019 | 48697 |
| 20 | Louisiana | 2019 | 57445 |
| 21 | Maine | 2019 | 50915 |
| 22 | Maryland | 2019 | 71838 |
| 23 | Massachusetts | 2019 | 86942 |
| 24 | Michigan | 2019 | 54928 |
| 25 | Minnesota | 2019 | 68427 |
| 26 | Mississippi | 2019 | 40464 |
| 27 | Missouri | 2019 | 54879 |
| 28 | Montana | 2019 | 49540 |
| 29 | Nebraska | 2019 | 66737 |
| 30 | Nevada | 2019 | 58570 |
| 31 | New Hampshire | 2019 | 66069 |
| 32 | New Jersey | 2019 | 73451 |
| 33 | New Mexico | 2019 | 50022 |
| 34 | New York | 2019 | 90043 |
| 35 | North Carolina | 2019 | 56862 |
| 36 | North Dakota | 2019 | 75321 |
| 37 | Northern Mariana Islands | 2019 | 24500 |
| 38 | Ohio | 2019 | 60464 |
| 39 | Oklahoma | 2019 | 52409 |
| 40 | Oregon | 2019 | 60558 |
| 41 | Pennsylvania | 2019 | 64412 |
| 42 | Puerto Rico | 2019 | 31651 |
| 43 | Rhode Island | 2019 | 60830 |
| 44 | South Carolina | 2019 | 48547 |
| 45 | South Dakota | 2019 | 61104 |
| 46 | Tennessee | 2019 | 56451 |
| 47 | Texas | 2019 | 66149 |
| 48 | U.S. Virgin Islands | 2019 | 37000 |
| 49 | United States | 2019 | 65281 |
| 50 | Utah | 2019 | 59892 |
| 51 | Vermont | 2019 | 56525 |
| 52 | Virginia | 2019 | 65824 |
| 53 | Washington | 2019 | 80170 |
| 54 | West Virginia | 2019 | 43806 |
| 55 | Wisconsin | 2019 | 60425 |
| 56 | Wyoming | 2019 | 68757 |
GDP_cap = pd.melt(GDP_cap_rest_raw, id_vars =["state"], value_vars =["2015", "2016", "2017", "2018"])
GDP_cap = pd.concat([GDP_cap,GDP_cap_2019])
GDP_cap = GDP_cap.rename(columns={"variable": "year", "value": "gdp_cap"})
GDP_cap['year'] = GDP_cap['year'].astype(int)
GDP_cap
| state | year | gdp_cap | |
|---|---|---|---|
| 0 | Alabama | 2015 | 36660 |
| 1 | Alaska | 2015 | 69700 |
| 2 | Arizona | 2015 | 38303 |
| 3 | Arkansas | 2015 | 35865 |
| 4 | California | 2015 | 53855 |
| ... | ... | ... | ... |
| 52 | Virginia | 2019 | 65824 |
| 53 | Washington | 2019 | 80170 |
| 54 | West Virginia | 2019 | 43806 |
| 55 | Wisconsin | 2019 | 60425 |
| 56 | Wyoming | 2019 | 68757 |
265 rows × 3 columns
After gdp datasets, we will be cleaning annual average temperature by states.
avg_temp = avg_temp_raw.drop(["Anomaly"], axis=1)
avg_temp = avg_temp.rename(columns={"State": "state", "Date": "year", "Value": "temp"})
avg_temp
| state | year | temp | |
|---|---|---|---|
| 0 | Alabama | 2015 | 64.6 |
| 1 | Alabama | 2016 | 65.3 |
| 2 | Alabama | 2017 | 65.0 |
| 3 | Alabama | 2018 | 64.5 |
| 4 | Alabama | 2019 | 65.4 |
| ... | ... | ... | ... |
| 240 | Alaska | 2015 | 30.0 |
| 241 | Alaska | 2016 | 31.9 |
| 242 | Alaska | 2017 | 29.3 |
| 243 | Alaska | 2018 | 30.4 |
| 244 | Alaska | 2019 | 32.2 |
245 rows × 3 columns
We will be cleaning up the state median income dataset now. We will be removing all the column besides 2015-2019 data, and also the United States row.
median_income_raw.columns = median_income_raw.iloc[0]
median_income_raw = median_income_raw[1:]
median_income_raw.columns = median_income_raw.columns.astype(str)
median_income = median_income_raw[["Year","2015", "2016", "2017", "2018", "2019"]]
median_income = median_income.rename(columns={"Year": "state"})
median_income.drop(median_income[median_income.state == "United States"].index, inplace=True)
median_income = pd.melt(median_income, id_vars =["state"], value_vars =["2015", "2016", "2017", "2018", "2019"])
median_income = median_income.rename(columns={0: "year", "value": "median_income"})
median_income['year'] = median_income['year'].astype(int)
median_income
| state | year | median_income | |
|---|---|---|---|
| 0 | Alabama | 2015 | 44509 |
| 1 | Alaska | 2015 | 75112 |
| 2 | Arizona | 2015 | 52248 |
| 3 | Arkansas | 2015 | 42798 |
| 4 | California | 2015 | 63636 |
| ... | ... | ... | ... |
| 250 | Virginia | 2019 | 81313 |
| 251 | Washington | 2019 | 82454 |
| 252 | West Virginia | 2019 | 53706 |
| 253 | Wisconsin | 2019 | 67355 |
| 254 | Wyoming | 2019 | 65134 |
255 rows × 3 columns
After cleaning up all the datasets separately, we will now try to combine all the useful data into one dataframe. Since it would be difficult to keep track of many datasets at the same time.
air_pollution = pd.concat([air_pollution_2015,air_pollution_2016,air_pollution_2017,air_pollution_2018,air_pollution_2019])
air_pollution = air_pollution.rename(columns={"value": "ap_value"})
air_pollution = air_pollution.drop(["data_type"], axis=1)
air_pollution
| year | state | ap_value | |
|---|---|---|---|
| 0 | 2015 | Alabama | 9.5 |
| 1 | 2015 | Alaska | 6.0 |
| 2 | 2015 | Arizona | 9.7 |
| 3 | 2015 | Arkansas | 9.7 |
| 4 | 2015 | California | 12.5 |
| ... | ... | ... | ... |
| 97 | 2019 | Vermont | 5.1 |
| 98 | 2019 | Washington | 8.0 |
| 99 | 2019 | Wisconsin | 6.8 |
| 100 | 2019 | West Virginia | 7.6 |
| 101 | 2019 | Wyoming | 5.0 |
250 rows × 3 columns
hs_grad = pd.concat([high_school_2015,high_school_2016,high_school_2017,high_school_2018,high_school_2019])
hs_grad = hs_grad.rename(columns={"value": "hs_value"})
hs_grad = hs_grad.drop(["data_type"], axis=1)
hs_grad
| year | state | hs_value | |
|---|---|---|---|
| 25896 | 2015 | Alabama | 80.0 |
| 25897 | 2015 | Alaska | 71.8 |
| 25898 | 2015 | Arizona | 75.1 |
| 25899 | 2015 | Arkansas | 84.9 |
| 25900 | 2015 | California | 80.4 |
| ... | ... | ... | ... |
| 33632 | 2019 | Virginia | 86.9 |
| 33633 | 2019 | Washington | 79.4 |
| 33634 | 2019 | West Virginia | 89.4 |
| 33635 | 2019 | Wisconsin | 88.6 |
| 33636 | 2019 | Wyoming | 86.2 |
250 rows × 3 columns
phys_act = pd.concat([phys_2015,phys_2016,phys_2017,phys_2018,phys_2019])
phys_act = phys_act.rename(columns={"value": "phy_value"})
phys_act = phys_act.drop(["data_type"], axis=1)
phys_act
| year | state | phy_value | |
|---|---|---|---|
| 32549 | 2015 | Alabama | 27.6 |
| 32550 | 2015 | Alaska | 19.2 |
| 32551 | 2015 | Arizona | 21.2 |
| 32552 | 2015 | Arkansas | 30.7 |
| 32553 | 2015 | California | 21.7 |
| ... | ... | ... | ... |
| 47099 | 2019 | Virginia | 22.0 |
| 47100 | 2019 | Washington | 17.6 |
| 47101 | 2019 | West Virginia | 28.2 |
| 47102 | 2019 | Wisconsin | 21.8 |
| 47103 | 2019 | Wyoming | 21.7 |
250 rows × 3 columns
health_fund = pd.concat([health_fund_2015,health_fund_2016,health_fund_2017,health_fund_2018,health_fund_2019])
health_fund = health_fund.rename(columns={"value": "hf_value"})
health_fund = health_fund.drop(["data_type"], axis=1)
health_fund
| year | state | hf_value | |
|---|---|---|---|
| 37229 | 2015 | Alabama | 111.0 |
| 37230 | 2015 | Alaska | 233.0 |
| 37231 | 2015 | Arizona | 44.0 |
| 37232 | 2015 | Arkansas | 98.0 |
| 37233 | 2015 | California | 103.0 |
| ... | ... | ... | ... |
| 50478 | 2019 | Virginia | 77.0 |
| 50479 | 2019 | Washington | 96.0 |
| 50480 | 2019 | West Virginia | 140.0 |
| 50481 | 2019 | Wisconsin | 55.0 |
| 50482 | 2019 | Wyoming | 112.0 |
250 rows × 3 columns
obesity = pd.concat([obesity_2015,obesity_2016,obesity_2017,obesity_2018,obesity_2019])
obesity = obesity.rename(columns={"value": "ob_value"})
obesity = obesity.drop(["data_type"], axis=1)
obesity
| year | state | ob_value | |
|---|---|---|---|
| 30989 | 2015 | Alabama | 33.5 |
| 30990 | 2015 | Alaska | 29.7 |
| 30991 | 2015 | Arizona | 28.9 |
| 30992 | 2015 | Arkansas | 35.9 |
| 30993 | 2015 | California | 24.7 |
| ... | ... | ... | ... |
| 45436 | 2019 | Virginia | 30.3 |
| 45437 | 2019 | Washington | 28.7 |
| 45438 | 2019 | West Virginia | 39.5 |
| 45439 | 2019 | Wisconsin | 32.0 |
| 45440 | 2019 | Wyoming | 29.0 |
250 rows × 3 columns
df = pd.merge(air_pollution,obesity,on=["year","state"],how="outer")
df = pd.merge(df,hs_grad,on=["year","state"],how="left")
df = pd.merge(df,phys_act,on=["year","state"],how="left")
df = pd.merge(df,health_fund,on=["year","state"],how="left")
df = pd.merge(df, GDP_by_state, on=["year","state"], how="left")
df = pd.merge(df, GDP_by_state_pct, on=["year","state"], how="left")
df = pd.merge(df, median_income, on=["year","state"], how="left")
df = pd.merge(df, avg_temp, on=["year","state"], how="left")
df = pd.merge(df, GDP_cap, on=["year","state"], how="left")
We will also be adding a new column for normalized year, which might help ease the varying scales issue in the upcoming sessions.
df["norm_year"] = df["year"] - 2015
We can see that we do not have the state of Hawaii's temperature data, we will leave Hawaii out of the dataset in this tutorial.
df[df.isnull().any(axis=1)]
| year | state | ap_value | ob_value | hs_value | phy_value | hf_value | gdp | gdp_pct | median_income | temp | gdp_cap | norm_year | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 10 | 2015 | Hawaii | 7.6 | 22.1 | 82.4 | 19.6 | 209.0 | 82644.0 | 0.5 | 64514 | NaN | 49539 | 0 |
| 60 | 2016 | Hawaii | 7.0 | 22.7 | 81.8 | 22.5 | 237.0 | 85899.9 | 0.5 | 72133 | NaN | 49497 | 1 |
| 110 | 2017 | Hawaii | 5.9 | 23.8 | 81.6 | 20.8 | 257.0 | 89618.6 | 0.5 | 73575 | NaN | 50320 | 2 |
| 160 | 2018 | Hawaii | 5.8 | 23.8 | 82.7 | 23.5 | 224.0 | 93100.5 | 0.5 | 80108 | NaN | 51277 | 3 |
| 210 | 2019 | Hawaii | 5.4 | 24.9 | 82.7 | 19.9 | 192.0 | 95744.3 | 0.4 | 88006 | NaN | 69593 | 4 |
df = df[df["state"] != "Hawaii"]
After all the processing and cleaning, we finally got a single dataframe that contains all the data we need.
ap_value: Average exposure of the general public to particulate matter of 2.5 microns or less measured in micrograms per cubic meter (3-year estimate)
ob_value: Percentage of adults with a body mass index of 30.0 or higher based on reported height and weight
hs_value: Percentage of high school students who graduated with a regular high school diploma within four years of starting ninth grade
phy_value: Percentage of adults who reported doing no physical activity or exercise other than their regular job in the past 30 days
hf_value: State dollars dedicated to public health and federal dollars directed to states by the Centers for Disease Control and Prevention and the Health Resources Services Administration per person
gdp: Annual gross domestic product(GDP) in current United States Dollar for all industry total
gdp_pct: Annual gross domestic product(GDP) in percent of United States for all industry total
median_income: Median household income in United States Dollar
temp: Average annual temperature in Fahrenheit
gdp_cap: Annual gross domestic product(GDP) in current United States Dollar for all industry total per capita
norm_year: normalized year(0: 2015, 1: 2016, 2: 2017, 3: 2018, 4:2019)
df
| year | state | ap_value | ob_value | hs_value | phy_value | hf_value | gdp | gdp_pct | median_income | temp | gdp_cap | norm_year | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2015 | Alabama | 9.5 | 33.5 | 80.0 | 27.6 | 111.0 | 200197.5 | 1.1 | 44509 | 64.6 | 36660 | 0 |
| 1 | 2015 | Alaska | 6.0 | 29.7 | 71.8 | 19.2 | 233.0 | 50728.1 | 0.3 | 75112 | 30.0 | 69700 | 0 |
| 2 | 2015 | Arizona | 9.7 | 28.9 | 75.1 | 21.2 | 44.0 | 298615.0 | 1.6 | 52248 | 61.8 | 38303 | 0 |
| 3 | 2015 | Arkansas | 9.7 | 35.9 | 84.9 | 30.7 | 98.0 | 117734.3 | 0.6 | 42798 | 61.4 | 35865 | 0 |
| 4 | 2015 | California | 12.5 | 24.7 | 80.4 | 21.7 | 103.0 | 2559643.2 | 14.0 | 63636 | 60.8 | 53855 | 0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 245 | 2019 | Vermont | 5.1 | 27.5 | 89.1 | 18.9 | 144.0 | 34013.4 | 0.2 | 74305 | 41.9 | 56525 | 4 |
| 246 | 2019 | Washington | 8.0 | 28.7 | 79.4 | 17.6 | 96.0 | 612996.5 | 2.9 | 82454 | 46.5 | 80170 | 4 |
| 247 | 2019 | Wisconsin | 6.8 | 32.0 | 88.6 | 21.8 | 55.0 | 349416.5 | 1.6 | 67355 | 42.1 | 60425 | 4 |
| 248 | 2019 | West Virginia | 7.6 | 39.5 | 89.4 | 28.2 | 140.0 | 78863.9 | 0.4 | 53706 | 54.2 | 43806 | 4 |
| 249 | 2019 | Wyoming | 5.0 | 29.0 | 86.2 | 21.7 | 112.0 | 40420.1 | 0.2 | 65134 | 40.3 | 68757 | 4 |
245 rows × 13 columns
Although we now have a single dataframe for all our data, it is still difficult to find any interesting pattern. Hence, we will be plotting some graphs with our data to visualize any meaningful patterns for our analysis.
plt.figure(figsize = (13,6))
sns.regplot(data=df, x="norm_year", y="ob_value")
plt.title("Obesity value across normalized year")
plt.xlabel("Normalized year")
plt.ylabel("Obesity")
plt.tight_layout()
In this plot, we are trying see the relationship between year and obesity. We that obesity and year has a mild positive linear correlation. Which indicate that we need to be careful when with look at the relationship between obesity and other variables, since obesity will naturally increase every year.
plt.figure(figsize = (13,6))
sns.regplot(data=df, x="ap_value", y="ob_value")
plt.title("Air pollution value vs obesity value")
plt.xlabel("Air pollution")
plt.ylabel("Obesity")
plt.tight_layout()
In this plot, we will be looking at air pollution and obesity. We can see that there might be a slight positive linear relationship, however, it might be too mild for our purpose to be an interesting pattern.
plt.figure(figsize = (13,6))
sns.regplot(data=df, x="hs_value", y="ob_value")
plt.title("High school grad value vs obesity value across time")
plt.xlabel("High school grad")
plt.ylabel("Obesity")
plt.tight_layout()
When it comes to high school graduration rate vs obesity rate, we can see a more obvious positive linear relation. Also the margin of errors is smaller.
plt.figure(figsize = (13,6))
sns.regplot(data=df, x="phy_value", y="ob_value")
plt.title("Physical inactivity value vs obesity value across time")
plt.xlabel("Physical inactivity")
plt.ylabel("Obesity")
plt.tight_layout()
We can see that physical inactivity value has the strongest positive linear relation. This might hint us that physical inactivity rate might be correlated to obesity rate positively.
plt.figure(figsize = (13,6))
sns.regplot(data=df, x="hf_value", y="ob_value")
plt.title("Public Health Funding value vs obesity value across time")
plt.xlabel("Public Health Funding")
plt.ylabel("Obesity")
plt.tight_layout()
We can see that there are almost no linear relationship between public health funding and obesity.
plt.figure(figsize = (13,6))
plt.xlim(0.0*1e6,1.25*1e6)
sns.regplot(data=df, x="gdp", y="ob_value")
plt.title("GDP vs obesity value across time by state")
plt.xlabel("GDP")
plt.ylabel("Obesity")
plt.tight_layout()
For GDP vs obesity rate plot, we can see a slight negative linear relation.
plt.figure(figsize = (13,6))
sns.regplot(data=df, x="gdp_cap", y="ob_value")
plt.title("GDP per capita vs obesity value by state")
plt.xlabel("GDP per capita")
plt.ylabel("Obesity")
plt.tight_layout()
For GDP per capita vs obesity rate plot, we can see more negative linear relation than the gdp vs obesity rate.
plt.figure(figsize = (13,6))
sns.regplot(data=df, x="median_income", y="ob_value")
plt.title("Median household income vs obesity value by state")
plt.xlabel("Median household income")
plt.ylabel("Obesity")
plt.tight_layout()
We can see a clear negative linear relationship in this plot. This might show that lower median household income might have correlation to higher obesity rate.
plt.figure(figsize = (13,6))
sns.regplot(data=df, x="temp", y="ob_value")
plt.title("Annual average temperature vs obesity value by state")
plt.xlabel("Temperature")
plt.ylabel("Obesity")
plt.tight_layout()
We can see that annual average temperature vs obesity value might also be interesting for us to study, since it has a positive relation.
plt.figure(figsize = (15,8))
sns.heatmap(df.corr(), annot=True, cmap='Blues')
plt.show()
In this heatmap, we can see all the correlation across the whole dataframe, and all variables. This can help check and verify what we found from the plots above.
In this section, we will be using some different machine learning models to predict and analyze the pattern and possible correlations we found in the previous session. Before diving into the algorithm, we would have to split our data into the predictors and verifier for our results. Since in last session, we found that public health funding has basically no correlation with obesity rate, so we are not going to use public health funding as one of the predictor. Also, gdp, gdp percentage and gdp per capita are basically the same measures, hence, we will be only use gdp per capita as our predictor.
X = df.drop(["year","state","ob_value","gdp","gdp_pct","hf_value"], axis = 1)
X = StandardScaler().fit_transform(X)
y = df.ob_value
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
First, machine learning model we will be using is going to be linear regression. Since we were visuallizing the linear pattern across variables and obesity in the last session, we will be fitting the linear regression model first and see how it performs. We will be looking at the mean, mean absolute error and the coefficient of determination as a indicator of whether our model results in a good fit or not.
lr = LinearRegression()
lr.fit(X_train, y_train)
pred = lr.predict(X_test)
print("Mean: " + str(y.mean()))
print("Mean absolute error: " + str(mae(y_test, pred)))
print("r^2: " + str(lr.score(X_test,y_test)))
Mean: 30.31673469387757 Mean absolute error: 1.7918090343423365 r^2: 0.503026609854395
We can see that we have a mean of ~30.32, and a mean absolute error of ~1.79, which is about 6% of the mean. This might indicate that the margin of error of our model is relatively small. However, the coefficient of determination is relatively low, which tells us that there might not be a very strong linear correlation between variables and obesity.
Since we are not getting a very good result from linear regression model, we are going to try a different model now. we will be trying Gradient Boosting.
gb = GradientBoostingRegressor(random_state = 1)
gb.fit(X_train, y_train)
pred = gb.predict(X_test)
print("Mean: " + str(y.mean()))
print("Mean absolute error: " + str(mae(y_test, pred)))
print("r^2: " + str(gb.score(X_test,y_test)))
Mean: 30.31673469387757 Mean absolute error: 1.6093247640474244 r^2: 0.5211672772966947
We can see that we have a slightly smaller mean absolute error of ~1.60, which is about 5% of the mean. The coefficient of determination is also very similar to the linear regression model, ~0.52 for the gradient boosting model. Since these results are very similar to the linear regression model, there are not much improvement from the linear regression model, this might indicate that the linear regression model has explained most of the relation across variables and obesity. However, we are still going to try some of the other machine learning models, and see if there is any unexpected improvement or changes to the result.
The next model we are looking at is decision tree.
dt = tree.DecisionTreeRegressor(random_state = 1)
dt.fit(X_train, y_train)
pred = dt.predict(X_test)
print("Mean: " + str(y.mean()))
print("Mean absolute error: " + str(mae(y_test, pred)))
print("r^2: " + str(dt.score(X_test,y_test)))
Mean: 30.31673469387757 Mean absolute error: 2.3000000000000003 r^2: 0.11664263606585423
We can see the our results have dramatically changed. For mean absolute error, we are getting ~2.3 here, which is about 7.5%. Also, for coefficient of determination, we only have ~0.12 for this model. This tells us our data has a terrible fit for the decision tree model, this might not be a good approach for our dataset.
The last machine learning model we are going to look at is random forest.
rf = RandomForestRegressor(random_state = 1)
rf.fit(X_train, y_train)
pred = rf.predict(X_test)
print("Mean: " + str(y.mean()))
print("Mean absolute error: " + str(mae(y_test, pred)))
print("r^2: " + str(rf.score(X_test,y_test)))
Mean: 30.31673469387757 Mean absolute error: 1.7630816326530643 r^2: 0.5174936626911253
After looking at the random forest model, we can be more certain about what we claimed above. Linear regression model explains most of the correlation across variables in our dataset and obesity rate. The mean absolute error and coefficient of determination are extremely similar to the linear regression model, which are ~1.76 and ~0.52. It is disappointing that we are seeing such a low coefficient of determination, however, this is not meaningless, we can still get some insights to the data.
Beside machine learning, we will be also performing a hypothesis testing
Our null hypothesis would be socioeconomic status and enviromental factors are not correlated to obesity in US states. We would now perform a t-test, in order to find out whether we reject our null hypothesis or not.
X2 = sm.add_constant(X)
est = sm.OLS(y, X2)
est2 = est.fit()
print(est2.summary())
OLS Regression Results
==============================================================================
Dep. Variable: ob_value R-squared: 0.573
Model: OLS Adj. R-squared: 0.560
Method: Least Squares F-statistic: 45.37
Date: Wed, 21 Jul 2021 Prob (F-statistic): 2.14e-40
Time: 05:24:21 Log-Likelihood: -558.07
No. Observations: 245 AIC: 1132.
Df Residuals: 237 BIC: 1160.
Df Model: 7
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 30.3167 0.153 197.713 0.000 30.015 30.619
x1 0.3959 0.180 2.198 0.029 0.041 0.751
x2 0.6652 0.170 3.909 0.000 0.330 1.000
x3 1.1815 0.236 5.017 0.000 0.718 1.645
x4 -2.1569 0.288 -7.490 0.000 -2.724 -1.590
x5 -0.2420 0.216 -1.122 0.263 -0.667 0.183
x6 0.4111 0.233 1.764 0.079 -0.048 0.870
x7 1.3036 0.194 6.709 0.000 0.921 1.686
==============================================================================
Omnibus: 3.161 Durbin-Watson: 1.846
Prob(Omnibus): 0.206 Jarque-Bera (JB): 3.208
Skew: -0.272 Prob(JB): 0.201
Kurtosis: 2.865 Cond. No. 4.00
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
We can see that our p-values are extremely low, hence, as expected, we are not able to reject h0. Which means that socioeconomic status and enviromental factors are not correlated to obesity in US states, which might be disappointing as a result. However, there are a lot of interesting facts we found throughout the analysis, we will be discuss about that in the following, conclusion and insights session.
Some of the variables in our dataset, for instance, physical inactivity and public health funding, seem related to obesity to me at the beginning of the tutorial. However, after the tutorial, I learnt that there might not be that simple as I thought. Although we rejected our hypothesis, but does this mean there are absolutely no relationship between socioeconomic status and enviromental factors and obesity? Alsolutely not. There are always more approaches to one single problem, there are many ways to examine different features. This is what makes data science and machine learning challenging and exciting, there are always more to learn from what has already established, and I am looking forward to dive in this topic in another perspective.