Uninsurance and Unemployment

We wanted to see if there was any relationship between a county’s unemployment rate and its uninsured rate. Starting out, we expected there might be a correlation, since most people get health insurance through their employer, meaning they would become uninsured if they are unemployed. However, we found that there was no correlation. Instead, the data shows a possible correlation with other factors, especially the prevalence of minority racial groups.

import pandas as pd

# Load the data for the unemployment rate and uninsured rate
uninsured_by_county = pd.read_excel("./datasets/aspe-uninsured-estimates-by-county.xlsx", sheet_name=1)
unemployed_by_county = pd.read_excel("./datasets/bls-unemployed-stats-by-county-2020.xlsx")

# Merge the unemployment data with the uninsured data
joined_df = pd.concat([uninsured_by_county, unemployed_by_county], axis=1)

# Clean data
joined_df = joined_df[joined_df["Unemployment Rate (%)"] != "N.A."]
joined_df.replace("**", 0, inplace=True)
joined_df["Unemployment Rate (%)"] /= 100

Visuals of Unemployment Rate and Uninsurance Rate

Below is a histogram of the unemployment rate (orange) and uninsurance rate (blue) for each county. We can see that the distributions are both roughtly symmetrical, with uninsurance rate having a higher median and being more right-skewed.

Also below is a skatterplot of the unemployment rate and uninsurance rate. We can see that there is almost no correlation between the two.

from matplotlib import pyplot as plt

joined_df["Percent Uninsured"].hist(bins=20)
joined_df["Unemployment Rate (%)"].hist(bins=20)
plt.xlabel("Percent Uninsured and Unemployment Rate")
plt.ylabel("Count")
plt.title("Histogram of Percent Uninsured and Unemployment Rate")
plt.show()

plt.scatter(joined_df["Unemployment Rate (%)"], joined_df["Percent Uninsured"])
plt.xlabel("Unemployment Rate")
plt.ylabel("Percent Uninsured")
plt.title("Percent Uninsured vs. Unemployment Rate")
plt.show()

png png

What Other Variables Correlate with Uninsurance?

Since unemployment rate does not show a sizable correlation, we’ll look at other numeric variables that might correlate with uninsurance rate.

import numpy as np

# Drop all categorical columns
no_categorical = joined_df.drop(columns=["State Name", "FIPS Code", "County Name", "LAUS Code", "State FIPS Code", "County FIPS Code", "County Name/State Abbreviation", "Unnamed: 5", "Year"])

# Get the indices for percent uninsured
pu_index = list(no_categorical.columns).index("Percent Uninsured")

# Calculate a correlation coefficient matrix
A = np.corrcoef(no_categorical, rowvar=False)

# Sort the variables' correlation with percent uninsured, in increasing order
x = sorted([(column_name, value) for column_name, value in zip(no_categorical.columns, A[pu_index])], key=lambda x: abs(x[1]), reverse=True)

# Print the top 10 out (that aren't the percent uninsured itself)
for row in x[1:11]:
    print(row)

('Employed', -0.1253685386747601)
('Labor Force', -0.1232580355418897)
('Total Non-Elderly Population (Excluding Undocumented)', -0.11846189618250953)
('American Indian / Alaska Native', 0.1101397961824688)
('Unemployed', -0.1005180105089568)
('SNAP Recipient', 0.0677076551079456)
('HIU Income < 100% FPL', 0.05806924503342023)
('Asian / Native-Hawaiian / Pac Islander', -0.0558013266488515)
('Less than High School', 0.05140100165739355)
('Spanish/Hispanic/Latino Origin', 0.05098932178146017)

Positive Correlations

We can see tha the following factors most positively with uninsurance rates in each county:

Number of American Indian / Alaska Natives who are uninsured
Number of SNAP (food stamps) receipients who are uninsured
Number of people whose income is below the poverty line who are uninsured
Number of people who completed less than high school education who are uninsured
Number of people who are of Spanish / Hispanic / Latino origin who are uninsured

Two of these factors (1 and 5) are related to racial demographics. Two of these factors (2 and 3) are related to income. One of these factors (4) is related to education.

Negative Correlations

We can see that the following factors correlate most negatively with uninsurance rates in each county:

Total number of employed people
Total number of people in the labor force
Total number of Non-Elderly people (Excluding Unoducmented)
Total number of unemployed people
Number of Asians / Native Hawaiians / Pacific Islanders who are uninsured

Four of these factors (1,2,3,4) are related to population. One of these factors (5) is related to racial demographics.