Codes used for data cleansing
·95 words·1 min
Table of Contents
Split a string per deliminator #
2560x1600
1440x900
1920x1080
2880x1800
2560x1600
df.screenresolution.str.split("x").apply(lambda x: x[0])
Result:
2560
1440
1920
2880
2560
Set data type on a column of data frame #
df["cpu_freq"].astype("float")
df["ram"].astype("int")
Drop an unused column #
df.drop("column", axis=1)
Extract strings per speicifed #
25kg
10kg
45kg
555kg
df["weight"].str[:-2]
25
10
45
555
Quickly obtain columns values #
df.column_name.value_counts()
Convert categorical features into binary form #
df.join(pd.get_dummies(df.column))
df.drop("column", axis=1)
If it is a case that same value (name) exists, rename them accordingly #
gpu_categories = pd.get_dummies(df["gpu_brand"])
gpu_categories.columns = [col + "_gpu" for col in gpu_categories.columns]
df = df.join(gpu_categories)
df = df.drop("gpu_brand", axis=1)