У меня есть фрейм данных с двумя уровнями индекса столбцов.
Воспроизводимый набор данных.
df = pd.DataFrame(
[ ['Gaz','Gaz','Gaz','Gaz'],
['X','X','X','X'],
['Y','Y','Y','Y'],
['Z','Z','Z','Z']],
columns=pd.MultiIndex.from_arrays([['A','A','C','D'],
['Name','Name','Company','Company']])
![df1](https://i.stack.imgur.com/ytkpi.png)
I want to rename the duplicated MultiIndex columns, only when level-0 and level-1 combined is duplicated. Then add a suffix number to the end. Like the one below.
![df2](https://i.stack.imgur.com/PIloj.png)
Below is a solution I found, but it only works for single level column index.
class renamer():
def __init__(self):
self.d = dict()
def __call__(self, x):
if x not in self.d:
self.d[x] = 0
return x
else:
self.d[x] += 1
return "%s_%d" % (x, self.d[x])
df = df.rename(columns=renamer())
I think the above method can be modified to support the multi level situation, but I am too new to pandas/python.
Thanks in advance.
@Datanovice
This is to clarify to you about the output what I need.
I have the snippet below.
import pandas as pd
import numpy as np
df = pd.DataFrame(
[ ['Gaz','Gaz','Gaz','Gaz'],
['X','X','X','X'],
['Y','Y','Y','Y'],
['Z','Z','Z','Z']],
columns=pd.MultiIndex.from_arrays([
['A','A','C','A'],
['A','A','C','A'],
['Company','Company','Company','Name']]))
s = pd.DataFrame(df.columns.tolist())
cond = s.groupby(0).cumcount()
s = [np.where(cond.gt(0),s[i] + '_' + cond.astype(str),s[i]) for i in
range(df.columns.nlevels)]
s = pd.DataFrame(s)
#print(s)
df.columns = pd.MultiIndex.from_arrays(s.values.tolist())
print(df)
The current result is-
текущий выход
Мне нужно, чтобы последняя часть индекса столбца не считалась дублированной, так как «AA-Name» не совпадает с первыми двумя.
Еще раз спасибо .