IIU C, используйте pandas.DataFrame.index.groupby
.
С псевдофреймом df
: (обратите внимание, что для демонстрации я добавил последние три строки):
print(df)
cit2ref reference _id
0 NaN All about depression: Diagnosis. (2013). Retri... Y17-1020
0 NaN American Psychological Association. (2016). Ce... Y17-1020
0 NaN American Psychological Association. (2016). Pa... Y17-1020
0 NaN Beattie, G.S. (2005, November). Social Causes ... Y17-1020
0 NaN Burton (2012) Burton, N. (2012, June 5). D... Y17-1020
0 NaN Clark, P., Niblett, T. (1988, October 25). The... Y17-1020
0 NaN Choudhury, 2014 De Choudhury, M., Counts, ... Y17-1020
0 NaN De Choudhury, M., Gamon, M., Couns, S., %27 Ho... Y17-1020
0 NaN Gotlib and Joormann (2010) Gotlib IH, Kasch K... Y17-1020
0 NaN Gotlib, I. H., %27 Hammen, C. L. (1992). Psych... Y17-1020
0 NaN Gotlib IH, Joormann J. Cognition and depressio... Y17-1020
0 NaN Hu, Quan, Ang Li, Fei Heng, Jianpeng Li, and T... Y17-102
1 NaN All about depression: Diagnosis. (2013). Retri... Y17-1020
1 NaN American Psychological Association. (2016). Ce... Y17-1020
1 NaN StackOverflow. Not to be grouped-by Y17-102
Тогда groupby
:
df.index.groupby(df['reference'])
# or
d = {k: list(v) for k, v in df.index.groupby(df['reference']).items()}
new_df = pd.DataFrame.from_dict(d, orient='index').reset_index()
print(new_df)
# this looks prettier
index 0
0 All about depression: Diagnosis. (2013). Retri... [0, 1]
1 American Psychological Association. (2016). Ce... [0, 1]
2 American Psychological Association. (2016). Pa... [0]
3 Beattie, G.S. (2005, November). Social Causes ... [0]
4 Burton (2012) Burton, N. (2012, June 5). D... [0]
5 Choudhury, 2014 De Choudhury, M., Counts, ... [0]
6 Clark, P., Niblett, T. (1988, October 25). The... [0]
7 De Choudhury, M., Gamon, M., Couns, S., %27 Ho... [0]
8 Gotlib IH, Joormann J. Cognition and depressio... [0]
9 Gotlib and Joormann (2010) Gotlib IH, Kasch K... [0]
10 Gotlib, I. H., %27 Hammen, C. L. (1992). Psych... [0]
11 Hu, Quan, Ang Li, Fei Heng, Jianpeng Li, and T... [0]
12 StackOverflow. Not to be grouped-by [1]
Вы можете увидеть, какая бумага появилась в каких индексах. Если вы хотите считать, вы можете использовать len
вместо list
:
d = {k: len(v) for k, v in df.index.groupby(df['reference']).items()}
new_df = pd.DataFrame.from_dict(d, orient='index').reset_index()
print(new_df)
Выход:
index 0
0 All about depression: Diagnosis. (2013). Retri... 2
1 American Psychological Association. (2016). Ce... 2
2 American Psychological Association. (2016). Pa... 1
3 Beattie, G.S. (2005, November). Social Causes ... 1
4 Burton (2012) Burton, N. (2012, June 5). D... 1
5 Choudhury, 2014 De Choudhury, M., Counts, ... 1
6 Clark, P., Niblett, T. (1988, October 25). The... 1
7 De Choudhury, M., Gamon, M., Couns, S., %27 Ho... 1
8 Gotlib IH, Joormann J. Cognition and depressio... 1
9 Gotlib and Joormann (2010) Gotlib IH, Kasch K... 1
10 Gotlib, I. H., %27 Hammen, C. L. (1992). Psych... 1
11 Hu, Quan, Ang Li, Fei Heng, Jianpeng Li, and T... 1
12 StackOverflow. Not to be grouped-by 1