У меня есть три кадра данных, которые я объединяю, и затем удаляю дубликаты. Но когда я удаляю дубликаты из моих последних трех столбцов, я получаю значения NaN в вершинах кадра данных, которые я хочу удалить, но, похоже, не могу найти способ сделать это.
Вот мой код:
bDF=pd.read_csv(bRaw)
pDF=pd.read_csv(pRaw)
mDF=pd.read_csv(mRaw)
del bRaw,pRaw,mRaw
#Merge Together Datarames on the Value Role Name
dfs=[bDF,pDF,mDF]
df_merged = reduce(lambda left,right: pd.merge(left,right,on=['R1'],
how='outer'), dfs)
del bDF,pDF,mDF,dfs
#Rearrange Columns
cols=df_merged.columns.tolist()
cols=cols[0:1]+cols[-3:]+cols[1:5]
df_merged=df_merged[cols]
Вывод после слияния:
+------+-----+------+----+--------+--------+--------+--------+
| R | C | D | JC | R | PM | Nme | Vle |
+------+-----+------+----+--------+--------+--------+--------+
| JMAC | 305 | 3302 | I6 | Cofow | Value1 | Value1 | Value1 |
| JMAC | 305 | 3915 | R6 | Cofow | Value1 | Value1 | Value1 |
| JMAC | 301 | 3302 | I6 | Cofow | Value1 | Value1 | Value1 |
| JMAC | 301 | 3915 | R6 | Cofow | Value1 | Value1 | Value1 |
| JMAC | 305 | 3302 | I6 | Cofow | Value2 | Value2 | Value2 |
| JMAC | 305 | 3915 | R6 | Cofow | Value2 | Value2 | Value2 |
| JMAC | 301 | 3302 | I6 | Cofow | Value2 | Value2 | Value2 |
| JMAC | 301 | 3915 | R6 | Cofow | Value2 | Value2 | Value2 |
| JMAC | 305 | 3302 | I6 | Cofow | Value3 | Value3 | Value3 |
| JMAC | 305 | 3915 | R6 | Cofow | Value3 | Value3 | Value3 |
| JMAC | 301 | 3302 | I6 | Cofow | Value3 | Value3 | Value3 |
| JMAC | 301 | 3915 | R6 | Cofow | Value3 | Value3 | Value3 |
| JMAC | 305 | 3302 | I6 | Cofow | Value4 | Value4 | Value4 |
| JMAC | 305 | 3915 | R6 | Cofow | Value4 | Value4 | Value4 |
| JMAC | 301 | 3302 | I6 | Cofow | Value4 | Value4 | Value4 |
| JMAC | 301 | 3915 | R6 | Cofow | Value4 | Value4 | Value4 |
| JMAP | 301 | 3315 | I6 | Cofowd | Value6 | Value6 | Value6 |
| JMAP | 301 | 3916 | R6 | Cofowd | Value6 | Value6 | Value6 |
| JMAP | 305 | 3314 | I6 | Cofowd | Value6 | Value6 | Value6 |
| JMAP | 305 | 3315 | R6 | Cofowd | Value6 | Value6 | Value6 |
| JMAP | 305 | 3916 | R6 | Cofowd | Value6 | Value6 | Value6 |
| JMAP | 301 | 3315 | I6 | Cofowd | Value7 | Value7 | Value7 |
| JMAP | 301 | 3916 | R6 | Cofowd | Value7 | Value7 | Value7 |
| JMAP | 305 | 3314 | I6 | Cofowd | Value7 | Value7 | Value7 |
| JMAP | 305 | 3315 | R6 | Cofowd | Value7 | Value7 | Value7 |
| JMAP | 305 | 3916 | R6 | Cofowd | Value7 | Value7 | Value7 |
| JMAP | 301 | 3315 | I6 | Cofowd | Value8 | Value8 | Value8 |
| JMAP | 301 | 3916 | R6 | Cofowd | Value8 | Value8 | Value8 |
| JMAP | 305 | 3314 | I6 | Cofowd | Value8 | Value8 | Value8 |
| JMAP | 305 | 3315 | R6 | Cofowd | Value8 | Value8 | Value8 |
| JMAP | 305 | 3916 | R6 | Cofowd | Value8 | Value8 | Value8 |
| JMAP | 301 | 3315 | I6 | Cofowd | Value9 | Value9 | Value9 |
| JMAP | 301 | 3916 | R6 | Cofowd | Value9 | Value9 | Value9 |
| JMAP | 305 | 3314 | I6 | Cofowd | Value9 | Value9 | Value9 |
| JMAP | 305 | 3315 | R6 | Cofowd | Value9 | Value9 | Value9 |
| JMAP | 305 | 3916 | R6 | Cofowd | Value9 | Value9 | Value9 |
+------+-----+------+----+--------+--------+--------+--------+
Затем я удаляю дубликаты из первых 4 столбцов, затем из последних трех столбцов и, наконец, из среднего столбца:
#Remove Duplicate Values
df_merged[cols[0:-3]]=df_merged[cols[0:-3]].mask(df_merged[cols[:-3]].duplicated())
df_merged[cols[-3:]]=df_merged[cols[-3:]].mask(df_merged[cols[-3:]].duplicated())
df_merged[cols[4:5]]=df_merged[cols[4:5]].mask(df_merged[cols[4:5]].duplicated())
df_merged=df_merged.dropna(how='all')
Мой вывод близок к окончательной форме:
+------+-----+------+----+-------+---------+---------+---------+
| R | C | D | JC | R | PM | Nme | Vle |
+------+-----+------+----+-------+---------+---------+---------+
| JMAC | 305 | 3302 | I6 | Cofow | Value1 | Value1 | Value1 |
| JMAC | 305 | 3915 | R6 | | NaN | NaN | NaN |
| JMAC | 301 | 3302 | I6 | | NaN | NaN | NaN |
| JMAC | 301 | 3915 | R6 | | NaN | NaN | NaN |
| | | | | | Value2 | Value2 | Value2 |
| | | | | | Value3 | Value3 | Value3 |
| | | | | | Value4 | Value4 | Value4 |
| | | | | | Value6 | Value6 | Value6 |
| | | | | | Value7 | Value7 | Value7 |
| JMAP | 301 | 3315 | I6 | Cofow | Value8 | Value8 | Value8 |
| JMAP | 301 | 3916 | R6 | | NaN | NaN | NaN |
| JMAP | 305 | 3314 | I6 | | NaN | NaN | NaN |
| JMAP | 305 | 3315 | R6 | | NaN | NaN | NaN |
| JMAP | 305 | 3916 | R6 | | NaN | NaN | NaN |
| | | | | | Value9 | Value9 | Value9 |
| | | | | | Value10 | Value10 | Value10 |
| | | | | | Value11 | Value11 | Value11 |
| | | | | | Value12 | Value12 | Value12 |
| | | | | | Value13 | Value13 | Value13 |
+------+-----+------+----+-------+---------+---------+---------+
Моя проблема в том, что я хочу избавиться от своих значений NaN и сдвинуть значения вверх. Поэтому я хочу, чтобы мой конечный результат выглядел примерно так:
+------+-----+------+----+-------+---------+---------+---------+
| R | C | D | JC | R | PM | Nme | Vle |
+------+-----+------+----+-------+---------+---------+---------+
| JMAC | 305 | 3302 | I6 | Cofow | Value1 | Value1 | Value1 |
| JMAC | 305 | 3915 | R6 | | Value2 | Value2 | Value2 |
| JMAC | 301 | 3302 | I6 | | Value3 | Value3 | Value3 |
| JMAC | 301 | 3915 | R6 | | Value4 | Value4 | Value4 |
| | | | | | Value6 | Value6 | Value6 |
| | | | | | Value7 | Value7 | Value7 |
| JMAP | 301 | 3315 | I6 | Cofow | Value8 | Value8 | Value8 |
| JMAP | 301 | 3916 | R6 | | Value9 | Value9 | Value9 |
| JMAP | 305 | 3314 | I6 | | Value10 | Value10 | Value10 |
| JMAP | 305 | 3315 | R6 | | Value11 | Value11 | Value11 |
| JMAP | 305 | 3916 | R6 | | Value12 | Value12 | Value12 |
| | | | | | Value13 | Value13 | Value13 |
+------+-----+------+----+-------+---------+---------+---------+
Я пытался разделить столбцы на два разных фрейма данных, удаляя NA и затем объединяя их, но затем мои данные отбрасывались из-за индексации.
df3=pd.concat([df2,df1], axis=1, ignore_index=False)
Любая помощь или идеи будут великолепны!
Большое спасибо,
1025 * Gist *