Classification | Notion

Data preprocess

make the columns name in the same format (lowercase + replace ‘ ‘ with _)
make categorical data in the same format (lowercase + replace ‘ ‘ with _)
to_numeric → change to numerical format
- errors → coerce (fix invalid parse as Nan)
change target variable to numerical (if the original data is in categorical form)

EDA

view target variable distribution
view churn rate
view Nan column

Feature importance

churn rate

mean of churn
- churn = 1 → person who churn
- mean of churn = person who churn / all samples = churn rate

Risk ration (group churn rate / global churn rate) -> relative term

churn rate of focus feature / global churn rate
>1 → more likely to churn
<1 → less likely to churn