Используйте Series.map
на dictionary
с fillna
для непревзойденных значений:
d = {'CollgCr': 'Middle',
'Veenker': 'Middle',
"Mitchel": 'Lower',
"OldTown": 'Lower',
"BrkSide": 'Lower',
"Sawyer": 'Lower',
"NAmes": 'Lower',
"IDOTRR": 'Lower',
"MeadowV": 'Lower',
"Edwards": 'Lower',
"NPkVill": 'Lower',
"BrDale": 'Lower',
"SWISU": 'Lower',
"Blueste": 'Lower'}
Или создать словарь динамического:
Mi = ['CollgCr', 'Veenker']
Lo = ["Mitchel", "OldTown", "BrkSide", "Sawyer", "NAmes", "IDOTRR",
"MeadowV", "Edwards", "NPkVill", "BrDale", "SWISU", "Blueste"]
d = {**dict.fromkeys(Lo, 'Lower'), **dict.fromkeys(Mi, 'Middle')}
df_full['new'] = df_full['city'].map(d).fillna('Upper')
print (df_full)
city new
0 CollgCr Middle
1 Veenker Middle
2 CollgCr Middle
3 Crawfor Upper
4 NoRidge Upper
5 Mitchel Lower
6 Somerst Upper
7 NWAmes Upper
8 OldTown Lower
9 BrkSide Lower
Это зависит от данных, но map
должно быть самым быстрым:
In [25]: %timeit (jez(df_full.copy()))
15 ms ± 260 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [26]: %timeit (raf(df_full.copy()))
20.3 ms ± 347 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [27]: %timeit (ct(df_full.copy()))
26.9 ms ± 286 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Код времени:
df_full = pd.DataFrame({'city': ['CollgCr', 'Veenker', 'CollgCr', 'Crawfor',
'NoRidge', 'Mitchel', 'Somerst', 'NWAmes',
'OldTown', 'BrkSide']})
#[100000 rows x 1 columns]
df_full = pd.concat([df_full] * 10000, ignore_index=True)
def jez(df_full):
d = {'CollgCr': 'Middle',
'Veenker': 'Middle',
"Mitchel": 'Lower',
"OldTown": 'Lower',
"BrkSide": 'Lower',
"Sawyer": 'Lower',
"NAmes": 'Lower',
"IDOTRR": 'Lower',
"MeadowV": 'Lower',
"Edwards": 'Lower',
"NPkVill": 'Lower',
"BrDale": 'Lower',
"SWISU": 'Lower',
"Blueste": 'Lower'}
df_full['new'] = df_full['city'].map(d).fillna('Upper')
return df_full
def raf(df):
m = ['CollgCr', 'Veenker']
l = ["Mitchel", "OldTown", "BrkSide", "Sawyer", "NAmes",
"IDOTRR","MeadowV", "Edwards", "NPkVill", "BrDale", "SWISU", "Blueste"]
df['new_col'] = np.select([df.city.isin(l), df.city.isin(m)],
['lower', 'middle'], default='upper')
return df
def ct(df):
df_types = pd.DataFrame({'CollgCr': 'Middle',
'Veenker': 'Middle',
"Mitchel": 'Lower',
"OldTown": 'Lower',
"BrkSide": 'Lower',
"Sawyer": 'Lower',
"NAmes": 'Lower',
"IDOTRR": 'Lower',
"MeadowV": 'Lower',
"Edwards": 'Lower',
"NPkVill": 'Lower',
"BrDale": 'Lower',
"SWISU": 'Lower',
"Blueste": 'Lower'}, index=['Type']).T
return df.merge(df_types, left_on='city', right_index=True, how='left').fillna('Upper')
print (jez(df_full.copy()))
print (raf(df_full.copy()))
print (ct(df_full.copy()))