Использовать пользовательскую функцию с np.arange
:
def f(x):
a = list(map(int, x.split(' to ')))
return pd.Series(1, index= np.arange(a[0], a[1] + 1))
df = df.join(df['active'].apply(f).add_prefix('Type '))
print (df)
ID start_date end_date active Type 1 Type 2 Type 3 Type 4 \
0 1,111 6/30/2015 8/6/1904 1 to 10 1.0 1.0 1.0 1.0
1 1,111 6/28/2016 3/30/1905 1 to 10 1.0 1.0 1.0 1.0
2 1,111 7/31/2017 6/6/1905 1 to 10 1.0 1.0 1.0 1.0
3 1,111 7/31/2018 6/6/1905 1 to 9 1.0 1.0 1.0 1.0
4 1,111 5/31/2019 12/4/1904 1 to 9 1.0 1.0 1.0 1.0
5 3,033 3/31/2015 5/18/1908 3 to 7 NaN NaN 1.0 1.0
6 3,033 3/31/2016 11/24/1905 3 to 7 NaN NaN 1.0 1.0
7 3,033 3/31/2017 1/20/1906 3 to 7 NaN NaN 1.0 1.0
8 3,033 3/31/2018 1/8/1906 2 to 7 NaN 1.0 1.0 1.0
9 3,033 4/4/2019 2200,0 2 to 8 NaN 1.0 1.0 1.0
Type 5 Type 6 Type 7 Type 8 Type 9 Type 10
0 1.0 1.0 1.0 1.0 1.0 1.0
1 1.0 1.0 1.0 1.0 1.0 1.0
2 1.0 1.0 1.0 1.0 1.0 1.0
3 1.0 1.0 1.0 1.0 1.0 NaN
4 1.0 1.0 1.0 1.0 1.0 NaN
5 1.0 1.0 1.0 NaN NaN NaN
6 1.0 1.0 1.0 NaN NaN NaN
7 1.0 1.0 1.0 NaN NaN NaN
8 1.0 1.0 1.0 NaN NaN NaN
9 1.0 1.0 1.0 1.0 NaN NaN
Аналогично:
def f(x):
a = list(map(int, x.split(' to ')))
return pd.Series(1, index= np.arange(a[0], a[1] + 1))
df = df.join(df['active'].apply(f).add_prefix('Type ').fillna(0).astype(int))
print (df)
ID start_date end_date active Type 1 Type 2 Type 3 Type 4 \
0 1,111 6/30/2015 8/6/1904 1 to 10 1 1 1 1
1 1,111 6/28/2016 3/30/1905 1 to 10 1 1 1 1
2 1,111 7/31/2017 6/6/1905 1 to 10 1 1 1 1
3 1,111 7/31/2018 6/6/1905 1 to 9 1 1 1 1
4 1,111 5/31/2019 12/4/1904 1 to 9 1 1 1 1
5 3,033 3/31/2015 5/18/1908 3 to 7 0 0 1 1
6 3,033 3/31/2016 11/24/1905 3 to 7 0 0 1 1
7 3,033 3/31/2017 1/20/1906 3 to 7 0 0 1 1
8 3,033 3/31/2018 1/8/1906 2 to 7 0 1 1 1
9 3,033 4/4/2019 2200,0 2 to 8 0 1 1 1
Type 5 Type 6 Type 7 Type 8 Type 9 Type 10
0 1 1 1 1 1 1
1 1 1 1 1 1 1
2 1 1 1 1 1 1
3 1 1 1 1 1 0
4 1 1 1 1 1 0
5 1 1 1 0 0 0
6 1 1 1 0 0 0
7 1 1 1 0 0 0
8 1 1 1 0 0 0
9 1 1 1 1 0 0
Другое решение без петель - идея состоит в том, чтобы удалять дубликаты, создавать новые строки с get_dummies
, reindex
для добавления отсутствующих столбцов и последним добавлением 1
несколькими cumsum
ed значения:
df1 = (df.set_index('active', drop=False)
.pop('active')
.drop_duplicates()
.str.get_dummies(' to '))
df1.columns = df1.columns.astype(int)
df1 = df1.reindex(columns=np.arange(df1.columns.min(),df1.columns.max() + 1), fill_value=0)
df1 = (df1.cumsum(axis=1) * df1.iloc[:, ::-1].cumsum(axis=1)).clip_upper(1)
print (df1)
1 2 3 4 5 6 7 8 9 10
active
1 to 10 1 1 1 1 1 1 1 1 1 1
1 to 9 1 1 1 1 1 1 1 1 1 0
3 to 7 0 0 1 1 1 1 1 0 0 0
2 to 7 0 1 1 1 1 1 1 0 0 0
2 to 8 0 1 1 1 1 1 1 1 0 0
df = df.join(df1.add_prefix('Type '), on='active')
print (df)
ID start_date end_date active Type 1 Type 2 Type 3 Type 4 \
0 1,111 6/30/2015 8/6/1904 1 to 10 1 1 1 1
1 1,111 6/28/2016 3/30/1905 1 to 10 1 1 1 1
2 1,111 7/31/2017 6/6/1905 1 to 10 1 1 1 1
3 1,111 7/31/2018 6/6/1905 1 to 9 1 1 1 1
4 1,111 5/31/2019 12/4/1904 1 to 9 1 1 1 1
5 3,033 3/31/2015 5/18/1908 3 to 7 0 0 1 1
6 3,033 3/31/2016 11/24/1905 3 to 7 0 0 1 1
7 3,033 3/31/2017 1/20/1906 3 to 7 0 0 1 1
8 3,033 3/31/2018 1/8/1906 2 to 7 0 1 1 1
9 3,033 4/4/2019 2200,0 2 to 8 0 1 1 1
Type 5 Type 6 Type 7 Type 8 Type 9 Type 10
0 1 1 1 1 1 1
1 1 1 1 1 1 1
2 1 1 1 1 1 1
3 1 1 1 1 1 0
4 1 1 1 1 1 0
5 1 1 1 0 0 0
6 1 1 1 0 0 0
7 1 1 1 0 0 0
8 1 1 1 0 0 0
9 1 1 1 1 0 0