Skip to main content

Codes used for data cleansing

·95 words·1 min

Split a string per deliminator #

2560x1600
1440x900
1920x1080
2880x1800
2560x1600
df.screenresolution.str.split("x").apply(lambda x: x[0])

Result:

2560
1440
1920
2880
2560

Set data type on a column of data frame #

df["cpu_freq"].astype("float")
df["ram"].astype("int")

Drop an unused column #

df.drop("column", axis=1)

Extract strings per speicifed #

25kg
10kg
45kg
555kg
df["weight"].str[:-2]
25
10
45
555

Quickly obtain columns values #

df.column_name.value_counts()

Convert categorical features into binary form #

df.join(pd.get_dummies(df.column))
df.drop("column", axis=1)

If it is a case that same value (name) exists, rename them accordingly #

gpu_categories = pd.get_dummies(df["gpu_brand"])
gpu_categories.columns = [col + "_gpu" for col in gpu_categories.columns]
df = df.join(gpu_categories)
df = df.drop("gpu_brand", axis=1)