Используйте merge_asof
с DataFrame.sort_values
и последним DataFrame.sort_values
:
df = (pd.merge_asof(df2.sort_values('u_g_1'),
df1.sort_values('u_g'),
left_on='u_g_1',
left_by='type',
right_on='u_g',
right_by='spec')
.sort_values('id', ignore_index=True))
print (df)
id type u_g_1 spec u_g target
0 1 WG2 1.4 WG2 1.4 0.71
1 2 WG2 1.4 WG2 1.4 0.71
2 3 WG2 1.0 WG2 1.0 0.52
3 4 G1 4.8 G1 4.8 0.88
4 5 G1 4.9 G1 4.8 0.88
5 6 G2 2.1 G2 2.1 0.76
6 7 SWG3 0.7 SWG3 0.7 0.31
7 8 WG3 0.8 WG3 0.8 0.65
8 9 WG3 0.7 WG3 0.7 0.53
9 10 WG2 1.1 WG2 1.0 0.52
10 11 NaN 0.0 NaN NaN NaN
EDIT: То же решение с измененным по умолчанию от direction='backward'
до direction='forward'
:
df = (pd.merge_asof(df2.sort_values('u_g_1'),
df1.sort_values('u_g'),
left_on='u_g_1',
left_by='type',
right_on='u_g',
right_by='spec',
direction='forward')
.sort_values('id', ignore_index=True))
print (df)
id type u_g_1 spec u_g target
0 1 WG2 1.4 WG2 1.4 0.71
1 2 WG2 1.4 WG2 1.4 0.71
2 3 WG2 1.0 WG2 1.0 0.52
3 4 G1 4.8 G1 4.8 0.88
4 5 G1 4.9 NaN NaN NaN <- 4.9 is greater like 4.8 so NaN
5 6 G2 2.1 G2 2.1 0.76
6 7 SWG3 0.7 SWG3 0.7 0.31
7 8 WG3 0.8 WG3 0.8 0.65
8 9 WG3 0.7 WG3 0.7 0.53
9 10 WG2 1.1 WG2 1.2 0.68 <- 1.1 is less like 1.1 so match
10 11 NaN 0.0 NaN NaN NaN
Другая идея с direction='nearest'
:
df = (pd.merge_asof(df2.sort_values('u_g_1'),
df1.sort_values('u_g'),
left_on='u_g_1',
left_by='type',
right_on='u_g',
right_by='spec',
direction='nearest')
.sort_values('id', ignore_index=True))
print (df)
id type u_g_1 spec u_g target
0 1 WG2 1.4 WG2 1.4 0.71
1 2 WG2 1.4 WG2 1.4 0.71
2 3 WG2 1.0 WG2 1.0 0.52
3 4 G1 4.8 G1 4.8 0.88
4 5 G1 4.9 G1 4.8 0.88
5 6 G2 2.1 G2 2.1 0.76
6 7 SWG3 0.7 SWG3 0.7 0.31
7 8 WG3 0.8 WG3 0.8 0.65
8 9 WG3 0.7 WG3 0.7 0.53
9 10 WG2 1.1 WG2 1.2 0.68
10 11 NaN 0.0 NaN NaN NaN
EDIT2: сначала используется direction='forward'
, а затем отсутствующие значения заменяются direction='backward'
:
df0 = (pd.merge_asof(df2.sort_values('u_g_1'),
df1.sort_values('u_g'),
left_on='u_g_1',
left_by='type',
right_on='u_g',
right_by='spec').set_index('id'))
print (df0)
type u_g_1 spec u_g target
id
11 NaN 0.0 NaN NaN NaN
7 SWG3 0.7 SWG3 0.7 0.31
9 WG3 0.7 WG3 0.7 0.53
8 WG3 0.8 WG3 0.8 0.65
3 WG2 1.0 WG2 1.0 0.52
10 WG2 1.1 WG2 1.0 0.52
1 WG2 1.4 WG2 1.4 0.71
2 WG2 1.4 WG2 1.4 0.71
6 G2 2.1 G2 2.1 0.76
4 G1 4.8 G1 4.8 0.88
5 G1 4.9 G1 4.8 0.88
df = (pd.merge_asof(df2.sort_values('u_g_1'),
df1.sort_values('u_g'),
left_on='u_g_1',
left_by='type',
right_on='u_g',
right_by='spec',
direction='forward')
.set_index('id')
.combine_first(df0)
.sort_index()
.reset_index()
)
print (df)
id type u_g_1 spec u_g target
0 1 WG2 1.4 WG2 1.4 0.71
1 2 WG2 1.4 WG2 1.4 0.71
2 3 WG2 1.0 WG2 1.0 0.52
3 4 G1 4.8 G1 4.8 0.88
4 5 G1 4.9 G1 4.8 0.88
5 6 G2 2.1 G2 2.1 0.76
6 7 SWG3 0.7 SWG3 0.7 0.31
7 8 WG3 0.8 WG3 0.8 0.65
8 9 WG3 0.7 WG3 0.7 0.53
9 10 WG2 1.1 WG2 1.2 0.68
10 11 NaN 0.0 NaN NaN NaN