Вы можете создать словарь столбцов без string1
с функцией first
и добавить count
для string1
, передать GroupBy.agg
и последний столбец переименования:
d = dict.fromkeys(df.columns.difference(['string1','theme']), 'first')
d['string1'] = 'count'
df_topics = (df.groupby(['string1','theme'], sort=False)
.agg(d)
.rename(columns={'string1':'count'})
.reset_index())
print (df_topics)
string1 theme site tool type count
0 houses white web phone A 3
1 houses black cloud NaN B 1
Подробно :
print (d)
{'site': 'first', 'tool': 'first', 'type': 'first', 'string1': 'count'}
Или используйте именованные агрегаты:
df_topics = (df.groupby(['string1','theme'], sort=False)
.agg(type=('type','first'),
tool=('tool','first'),
site=('site', 'first'),
count=('string1','count'))
.reset_index())
print (df_topics)
string1 theme type tool site count
0 houses white A phone web 3
1 houses black B NaN cloud 1
То же самое, что генерировать значения динамически:
d = {x: (x, 'first') for x in df.columns.difference(['string1','theme'])}
d['count'] = ('string1','count')
df_topics = (df.groupby(['string1','theme'], sort=False)
.agg(**d)
.reset_index())
print (df_topics)
string1 theme site tool type count
0 houses white web phone A 3
1 houses black cloud NaN B 1
EDIT1:
g = df.groupby(['string1','theme'], sort=False)
df1 = g.size()
df_topics = g.first()
df_topics = pd.concat([df_topics, df1.rename("count")], axis=1, sort=False).reset_index()
print (df_topics)
string1 theme type tool site count
0 houses white A phone web 3
1 houses black B NaN cloud 1