Вот быстрый метод, использующий numy isin, repeat и concatenate вместе со списками.Этот способ также позволяет пустым позициям предков быть пустыми строками или None или любым другим заполнителем.
df_vals = df.values
# count the number of sub-ancestors in each row
repeats = (~np.isin(df_vals, ['', None])).sum(axis=1) - 1
# find the oldest ancestor in each row
oldest_ancestors = np.array([df_vals[row, col] for row, col in enumerate(repeats)])
# make the oldest column by repeating the each oldest ancestor for each sub-ancestor
oldest = np.repeat(oldest_ancestors, repeats)
# make the plant column by getting all the sub-ancestors from each row and concatenating
plant = np.concatenate([df_vals[row][:col] for row, col in enumerate(repeats)])
df2 = pd.DataFrame({'plant': plant, 'oldest': oldest})
-
print(df2)
plant oldest
0 XX XX5
1 XX1 XX5
2 XX2 XX5
3 XX3 XX5
4 XX4 XX5
5 YY YY4
6 YY1 YY4
7 YY2 YY4
8 YY3 YY4
9 ZY YY4
10 ZZ1 YY4
11 ZZ2 YY4
12 YY2 YY4
13 YY3 YY4
14 SS1 SS3
15 SS2 SS3
Установочный кадр данных:
df = pd.DataFrame({'plant': ['XX', 'YY', 'ZY', 'SS1'],
'ancestor1': ['XX1', 'YY1', 'ZZ1', 'SS2'],
'ancestor2': ['XX2', 'YY2', 'ZZ2', 'SS3'],
'ancestor3': ['XX3', 'YY3', 'YY2', None],
'ancestor4': ['XX4', 'YY4', 'YY3', None],
'ancestor5': ['XX5', None, 'YY4', None]})