Mental disorders are a significant global concern, impacting millions of lives worldwide, from depression to
schizophrenia. Given their complexity and prevalence, understanding the connections between different
disorders
is crucial for developing effective prevention strategies, interventions, and treatments.
This project explores these connections and influencing factors, providing insights accessible to the
general public. It delves into two key questions: How do various mental health disorders interconnect, and
what factors influence them?
April, 2024
Python, R, Google Sheet, D3.js
58% people feel anxious
52% people feel agitated
52% people feel fatigued
41% people feel depressed
28% people feel despondent
18% people feel inadequate
Source from: CDC, NYC Health
Source from: CDC
Source from: CDC
Besides the co-occurrence situation, I further explored how different mental health conditions relate to each
other using hierarchical clustering. It's based on the matrix of Hamming distance, which measures how similar or
different these disorders are. To make the results easier to understand, I cleaned up the data by removing cases
where only one mental disorder type was present. This helps to remove any confusion caused by irrelevant
information.
Source from:
SAMHSA
Using Multivariate Correspondence Analysis (MCA) techniques, I navigated through mental health client-level
dataset from SAMHA, to uncover hidden patterns between demographic metrics such as age and gender, and mental
health disorders. Utilizing cosine similarity, a pivotal metric in data analysis, I precisely quantify the
strength of these relationships, ranging from 0 to 1. A value nearing 1 indicates a higher relation strength,
signifying a closer association between demographic factors and specific mental health conditions.
The data was meticulously rounded and filtered to exclude weaker relationships lower than 0.10, ensuring a focus
on
key findings.
Source from:
SAMHSA
The dataset utilized in this study was sourced from various reputable sources, including the Centers for Disease
Control and Prevention (CDC), the New York City Health Department, and the Substance Abuse and Mental Health
Services Administration (SAMHSA).
The binary data, representing the presence or absence of specific mental health conditions, was
processed using hierarchical clustering techniques, such as Hamming distance, to identify co-occurrence patterns
among different disorders. Additionally, categorical data, encompassing various demographic and diagnostic categories,
underwent analysis using multiple correspondence analysis (MCA) to reveal associations and trends within the
dataset.
Further research in this area could explore additional statistical techniques and expand the scope of analysis to uncover more nuanced patterns and correlations.
This project taught me a lot about mental health data analysis. By using techniques like Principal Component Analysis (PCA) and Multiple Correspondence Analysis (MCA), I found important connections in the data. MCA, particularly helpful with categorical data, was a standout choice. Working with Python for data modeling was a big step forward, helping me build accurate models. These experiences really boosted my statistical skills and set me up well for future projects.
I began this project with limited knowledge of how to conduct Multiple Correspondence Analysis (MCA) in Python. Special thanks to Max Halford for the invaluable Prince MCA tutorial, which provided crucial guidance and insights throughout the project.