Использование numpy.where
:
df = pd.DataFrame({
'year':[91,99,1,15,17,93],
'A':[7,8,9,4,2,3],
})
df['year1'] = np.where(df['year']>20, df['year']+1900, df['year']+2000)
print (df)
year A year1
0 91 7 1991
1 99 8 1999
2 1 9 2001
3 15 4 2015
4 17 2 2017
5 93 3 1993
Если столбец строк:
y = df['year'].astype(int)
df['year1'] = np.where(y>20, y+1900, y+2000)
Производительность :
np.random.seed(123)
N = 1000
df = pd.DataFrame({
'year':np.random.randint(1, 99, size=N),
})
In [55]: %timeit df['year1'] = np.where(df['year']>20, df['year']+1900, df['year']+2000)
615 µs ± 79.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [58]: %timeit df['year2'] = pd.to_datetime(df['year'].astype(str).str.zfill(2), format='%y').dt.year
3.49 ms ± 31.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Производительность для столбца строк :
N = 1000
df = pd.DataFrame({
'year':np.random.randint(1, 99, size=N),
})
df['year'] = df['year'].astype(str).str.zfill(2)
print (df.head())
year
0 36
1 55
2 39
3 05
4 55
In [80]: %%timeit
...: y = df['year'].astype(int)
...: df['year1'] = np.where(y>20, y+1900, y+2000)
...:
761 µs ± 14.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [81]: %%timeit
...: df['year2'] = pd.to_datetime(df['year'], format='%y').dt.year
...:
2.33 ms ± 44.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)