Полдень все,
У меня есть три больших сгруппированных по результатам.Упрощенный df представлен ниже.Первый df - это все Total RFQ клиента и Total RFQ Volume, без перерывов по продукту и валюте.
df1 = [('Year_Month', ['2017-11', '2017-12', '2018-01', '2018-02', '2018-05', '2018-06', '2018-07', '2018-08',]),
('Client', ['RBMI', 'RBMI', 'RBMI', 'RBMI', 'QCBO', 'QCBO', 'QCBO', 'QCBO',]),
('Total_RFQ_per_Client', [1, 2, 3, 4, 10, 20, 30, 40,]),
('Total_RFQ_Volume_per_Client', ['1000', '2000', '3000', '4000', '10000', '20000', '30000', '40000',]),
]
# create pandas df
df1 = pd.DataFrame.from_items(df1)
df1['Total_RFQ_per_Client']=df1.Total_RFQ_per_Client.astype('int64')
df1['Total_RFQ_Volume_per_Client']=df1.Total_RFQ_Volume_per_Client.astype('int64')
print(df1)
# df1.info()
print("")
Year_Month Client Total_RFQ_per_Client Total_RFQ_Volume_per_Client
0 2017-11 RBMI 1 1000
1 2017-12 RBMI 2 2000
2 2018-01 RBMI 3 3000
3 2018-02 RBMI 4 4000
4 2018-05 QCBO 10 10000
5 2018-06 QCBO 20 20000
6 2018-07 QCBO 30 30000
7 2018-08 QCBO 40 40000
Второй df - это все RFQ и RFQ клиента, которые были выполнены, и с добавленными столбцами Product и Currency.
print("All Clients - Done RFQ's - Done RFQ Volume - Broken down into Product and Currency", end='\n')
df2 = [('Year_Month', ['2017-11', '2018-01', '2018-01', '2018-02', '2018-05', '2018-07', '2018-08',]),
('Client', ['RBMI', 'RBMI', 'RBMI', 'RBMI', 'QCBO', 'QCBO', 'QCBO',]),
('Product', ['GOVT', 'GOVT', 'CORP', 'GOVT', 'GOVT', 'GOVT', 'GOVT',]),
('currency_str', ['USD', 'USD', 'GBP', 'USD', 'USD', 'USD', 'USD',]),
('Done_RFQ', [1, 1, 1, 1, 10, 20, 20,]),
('Done_RFQ_Volume', [1000, 500, 500, 1000, 10000, 20000, 20000,]),
]
# create pandas df
df2 = pd.DataFrame.from_items(df2)
df2['Done_RFQ']=df2.Done_RFQ.astype('int64')
df2['Done_RFQ_Volume']=df2.Done_RFQ_Volume.astype('int64')
print(df2)
# df2.info()
print("")
0 2017-11 RBMI GOVT USD 1 1000
1 2018-01 RBMI GOVT USD 1 500
2 2018-01 RBMI CORP GBP 1 500
3 2018-02 RBMI GOVT USD 1 1000
4 2018-05 QCBO GOVT USD 10 10000
5 2018-07 QCBO GOVT USD 20 20000
6 2018-08 QCBO GOVT USD 20 20000
Третий df - это все объемы запросов и запросов клиентов, которые НЕ были выполнены, и с добавленными столбцами Product и Currency.
df3 = [('Year_Month', ['2017-12', '2018-01', '2018-02', '2018-06', '2018-07', '2018-08',]),
('Client', ['RBMI', 'RBMI', 'RBMI', 'QCBO', 'QCBO', 'QCBO',]),
('Product', ['GOVT', 'CORP', 'GOVT', 'GOVT', 'GOVT', 'CORP',]),
('currency_str', ['USD', 'GBP', 'USD', 'USD', 'USD', 'CAD',]),
('Not_Done_RFQ', [2, 1, 3, 20, 10, 20,]),
('Not_Done_RFQ_Volume', [2000, 2000, 3000, 20000, 10000, 20000,]),
]
# create pandas df
df3 = pd.DataFrame.from_items(df3)
df3['Not_Done_RFQ']=df3.Not_Done_RFQ.astype('int64')
df3['Not_Done_RFQ_Volume']=df3.Not_Done_RFQ_Volume.astype('int64')
print(df3)
# df3.info()
print("")
Year_Month Client Product currency_str Not_Done_RFQ Not_Done_RFQ_Volume
0 2017-12 RBMI GOVT USD 2 2000
1 2018-01 RBMI CORP GBP 1 2000
2 2018-02 RBMI GOVT USD 3 3000
3 2018-06 QCBO GOVT USD 20 20000
4 2018-07 QCBO GOVT USD 10 10000
5 2018-08 QCBO CORP CAD 20 20000
Я хотел бы объединить или объединить все триодин такой, что результат выглядит следующим образом:
![enter image description here](https://i.stack.imgur.com/dVwEO.png)
Ключевые моменты здесь:
Total_RFQ = Done_RFQ + Not_Done_RFQ
Total_RFQ_per_Client is the column from df1` `i.e. it reflects the totals RFQ's with product and currency removed
Total_RFQ_Volume = Done_RFQ_Volume + Not_Done_RFQ_Volume
Total_RFQ_Volume_per_Client is the column from df1` `i.e. it reflects the totals RFQ's volume product and currency removed
Note for 2018-01, client RBMI there is a Product/Currency of GOVT/USD and CORP/GBP so the `Total_RFQ_per_Client` will display 3 for each row as this is the sum for 2018-01/RBMI in df1. Same principle applies for `Total_RFQ_Volume_per_Client`
Likewise the same situation exists for 2018-08/QCBO i.e. GOVT/USD and
CORP/CAD.
Мое решение кода следующее, ноУ меня проблемы с объединением оператора слияния:
print("Join Done Trades with not Done trades", end='\n')
dfTemp = pd.merge(df2, df3, how='outer', left_on=['Year_Month','Client'], right_on = ['Year_Month','Client'])
dfTemp = dfTemp.sort_values(['Client','Year_Month'], ascending=[False, True])
dfTemp = dfTemp.fillna(0)
display(dfTemp)
print("Join Done Trades/Not Done trades with Client Totals", end='\n')
df_Client_Product_Ccy_Hit_Rate_Volumes = pd.merge(dfTemp, df1, how='inner', left_on=['Year_Month','Client'], right_on = ['Year_Month','Client'])
# Concatenation results in NaN hence replace missing values by 0 - sum of columns was retuning zero as 500 + Nan is Nan
df_Client_Product_Ccy_Hit_Rate_Volumes = df_Client_Product_Ccy_Hit_Rate_Volumes.fillna(0)
print("Create additional calculated columns", end='\n')
df_Client_Product_Ccy_Hit_Rate_Volumes['Total_RFQ'] = df_Client_Product_Ccy_Hit_Rate_Volumes['Done_RFQ'] + df_Client_Product_Ccy_Hit_Rate_Volumes['Not_Done_RFQ']
df_Client_Product_Ccy_Hit_Rate_Volumes['Total_RFQ_Volume'] = (df_Client_Product_Ccy_Hit_Rate_Volumes['Done_RFQ_Volume']) + df_Client_Product_Ccy_Hit_Rate_Volumes['Not_Done_RFQ_Volume']
df_Client_Product_Ccy_Hit_Rate_Volumes = df_Client_Product_Ccy_Hit_Rate_Volumes.fillna(0)
# display(df_Client_Product_Ccy_Hit_Rate_Volumes)
# Select and Order the columns of interest
df_Client_Product_Ccy_Hit_Rate_Volumes = df_Client_Product_Ccy_Hit_Rate_Volumes[['Year_Month',
'Client',
'Product_x',
'currency_str_x',
'Done_RFQ',
'Not_Done_RFQ',
'Total_RFQ',
'Total_RFQ_per_Client',
'Done_RFQ_Volume',
'Not_Done_RFQ_Volume',
'Total_RFQ_Volume',
'Total_RFQ_Volume_per_Client'
]]
# Sort
dfTemp = df_Client_Product_Ccy_Hit_Rate_Volumes.sort_values(['Client', 'Year_Month'], ascending=[False, True])
display(dfTemp)
print("", end='\n')
Он выдает неправильные суммы, а также отсутствует строка 2018-08/QCBO/CORP/CAD
.
Любые рекомендации по слиянию приветствуются.