В чистых пандах без итерации и преобразования в список.
Сначала присоедините data
к df
так, чтобы заголовок "реплицировался" для каждого сравниваемого названия компании.Для облегчения этого соединения используется временный столбец «ключ».
In [60]: data_df = data.to_frame()
In [61]: data_df['key'] = 1
In [63]: df['key'] = 1
In [65]: merged = pd.merge(df, data_df, how='outer', on='key').drop('key', axis=1)
merged
будет выглядеть следующим образом.Как видите, в зависимости от размера data
, вы можете получить огромный DataFrame с помощью этого метода.
In [66]: merged
Out[66]:
headline source company
0 targets is making better stars in the bucks target news targets
1 targets is making better stars in the bucks target news stars in the bucks
2 targets is making better stars in the bucks target news wallymarty
3 targets is making better stars in the bucks target news velocity global
4 targets is making better stars in the bucks target news diamond in the rough
5 more diamonds than rocks in saturn rings wishful thinking targets
6 more diamonds than rocks in saturn rings wishful thinking stars in the bucks
7 more diamonds than rocks in saturn rings wishful thinking wallymarty
8 more diamonds than rocks in saturn rings wishful thinking velocity global
9 more diamonds than rocks in saturn rings wishful thinking diamond in the rough
10 diamond in the rough employees take too many naps refresh sleep targets
11 diamond in the rough employees take too many naps refresh sleep stars in the bucks
12 diamond in the rough employees take too many naps refresh sleep wallymarty
13 diamond in the rough employees take too many naps refresh sleep velocity global
14 diamond in the rough employees take too many naps refresh sleep diamond in the rough
Затем найдите текст в заголовке.Если найдено, помещает True в новый столбец "found", в противном случае - False.
In [67]: merged['found'] = merged.apply(lambda x: x['company'] in x['headline'], axis=1)
Затем удаляет заголовки, в которых совпадений не найдено:
In [68]: found_df = merged.drop(merged[merged['found']==False].index)
In [69]: found_df
Out[69]:
headline source company found
0 targets is making better stars in the bucks target news targets True
1 targets is making better stars in the bucks target news stars in the bucks True
14 diamond in the rough employees take too many naps refresh sleep diamond in the rough True
При необходимости, суммируйтетолько заголовок и компания
In [70]: found_df[['headline', 'company']]
Out[70]:
headline company
0 targets is making better stars in the bucks targets
1 targets is making better stars in the bucks stars in the bucks
14 diamond in the rough employees take too many naps diamond in the rough
Ярлык : Этапы 67 до конца можно суммировать с помощью этой команды
merged.drop(merged[merged.apply(lambda x: x['company'] in x['headline'], axis=1) == False].index)[['headline', 'source']]