Recoding a Column
Sometime data consists categorical column, like gender, in text form. But in the data analysis or in machine learning we need to fit then into the model which demands numerical data. So we need to convert them into binaries. With the help of recoding we manage this conversion.
Recoding means replacing existing values in a column with new ones often to make them numeric or standardized for analysis or machine learning. In machine learning, one such categorical column converted into number of dichotomous columns equal to the number of categories.
For example: gender={‘Male’, ‘Female’, ‘Transgender’}. After conversion we will get three binary (dummy) columns.
Simple replacement Using .map()
import pandas as pd
data = pd.DataFrame({
'gender': ['Male','Trans', 'Female', 'Female', 'Male', 'Male','Trans']
})
data['gender_Code'] = data['gender'].map({'Male': 1, 'Female': 0, 'Trans': 2})
data
|
gender |
gender_Code |
|
|
0 |
Male |
1 |
|
1 |
Trans |
2 |
|
2 |
Female |
0 |
|
3 |
Female |
0 |
|
4 |
Male |
1 |
|
5 |
Male |
1 |
|
6 |
Trans |
2 |
Statlearner
Statlearner