Использование boolean indexing
:
df1 = df[df.index.get_level_values(0).duplicated()]
print (df1)
Amount
client Date
0000000001 date2 val2
date3 val3
date4 val4
date5 val5
0000000002 date4 val7
0000000003 date2 val9
0000000004 date3 val11
date4 val12
date5 val13
Подробности :
Сначала получить значения первого уровня с помощью get_level_values
:
print (df.index.get_level_values(0))
Index(['0000000001', '0000000001', '0000000001', '0000000001', '0000000001',
'0000000002', '0000000002', '0000000003', '0000000003', '0000000004',
'0000000004', '0000000004', '0000000004'],
dtype='object', name='client')
И затем вернуть все значения без первых по duplicated
:
print (df.index.get_level_values(0).duplicated())
[False True True True True False True False True False True True
True]
Если возможно, дублируется groups
:
print (df)
Amount
client Date
0000000001 date1 val1
date2 val2
date3 val3
date4 val4
date5 val5
0000000002 date2 val6
date4 val7
0000000003 date1 val8
date2 val9
0000000001 date2 val10
date3 val11
date4 val12
date5 val13
s = df.index.get_level_values(0).to_series()
df1 = df[s.ne(s.shift()).cumsum().duplicated().values]
print (df1)
Amount
client Date
0000000001 date2 val2
date3 val3
date4 val4
date5 val5
0000000002 date4 val7
0000000003 date2 val9
0000000001 date3 val11
date4 val12
date5 val13
Деталь :
print (s.ne(s.shift()).cumsum())
client
0000000001 1
0000000001 1
0000000001 1
0000000001 1
0000000001 1
0000000002 2
0000000002 2
0000000003 3
0000000003 3
0000000001 4
0000000001 4
0000000001 4
0000000001 4
Name: client, dtype: int32