メインコンテンツへスキップ

Exporatory Data AnalysisのTips

·50 文字·1 分

Checking basic deatils of dataset #

df.shape # (5819079, 31)
df.columns

selected_col_df = df[['year', 'month', 'day', 'day_of_week', 'airline' ]]

selected_col_df.isna().sum()

Goal: predict delays and its cause #

Find outliers #

Use .hist function to screen outliers in each column

Train #

30% for test #

15% for test and 20% for validation #

Imbalaned dataset #