Exploring the Relationship Between Diabetes and Social Vulnerability: A Data-Driven Approach

In this post, we’ll delve into a dataset that represents the diagnosed diabetes percentage in various counties and its relationship with the Social Vulnerability Index (SVI). By understanding this relationship, we can gain insights into how social factors might influence health outcomes. We’ll be using various statistical and machine learning techniques to dissect this relationship.

Interactions and Nonlinearity:

  • We checked for interactions between ‘Diagnosed Diabetes Percentage’ and ‘Overall SVI’. This involves understanding if the effect of one variable changes based on the value of the other.
  • This means we want to see if the effect of ‘Diagnosed Diabetes Percentage’ on a dependent variable  changes based on the value of ‘Overall SVI’ and vice versa

Moving Beyond Linearity

  • ‘Diagnosed Diabetes Percentage’ and ‘Overall SVI’. From the plot, there seems to be a positive correlation between the two variables, but the relationship doesn’t appear to be strictly linear.

 

Polynomial Regression:

  • We introduced squared terms for both predictors to capture any quadratic relationships.
  • The mean squared error (MSE) for the polynomial regression model is . This indicates a perfect fit, which is unusual and could suggest overfitting.

Step Functions:

  • The ‘Diagnosed Diabetes Percentage’ was divided into intervals (bins), and a separate constant was fit for each interval.
  •  The step function model also resulted in an MSE of 0.0, reinforcing concerns about overfitting or data quality.

Our exploration suggests a non-linear relationship between ‘Diagnosed Diabetes Percentage’ and ‘Overall SVI’. However, the unusually perfect fits from our models warrant caution. In real-world scenarios, deeper data diagnostics, validation on separate datasets, and domain expertise are crucial to validate findings.

Leave a Reply

Your email address will not be published. Required fields are marked *