У меня есть два df's
df1
date League teams
0 201902272215 brazil cup foz do iguacu fcceara ce
1 201902272300 colombia primera a deportes tolimaatletico bucaramanga
2 201902272300 brazil campeonato gaucho 2nd division ypiranga rsuniao frederiquense
3 201902272300 brazil campeonato gaucho 2nd division esportivo rstupi rs
4 201902272300 brazil campeonato gaucho 2nd division sao paulo rsgremio esportivo bage
14 201902280000 four nations women tournament (in usa) usa (w)japan (w)
25 201902280030 bolivia professional football league real potosibolivar
df2
date league teams
0 201902280000 womens international usa womenjapan women
1 201902280000 brazil amazonense sul america ecrio negro am
2 201902280030 bolivia apertura real potosibolivar
3 201902280030 brazil campeonato paulista palmeirasituano
4 201902280030 copa sudamericana racing clubcorinthians
В результате я хотел бы получить все строки из df2, которые почти совпадают с df1
date league teams near_match
0 201902280000 womens international usa womenjapan women 1
1 201902280000 brazil amazonense sul america ecrio negro am 0
2 201902280030 bolivia apertura real potosibolivar 1
3 201902280030 brazil campeonato paulista palmeirasituano 0
4 201902280030 copa sudamericana racing clubcorinthians 0
Я пытался использовать вариацию a для l oop, используя SequenceMatcher
и устанавливая порог для совпадения выше 0,8, но мне не повезло.
df_1['merge_teams'] = df_1['teams'] # we will use these as the merge keys
df_1['merge_date'] = df_1['date']
# df_1['merge_league'] = df_1['league']
for teams_1, date_1, league_1 in df_1[['teams','date']].values:
for ixb, (teams_1, teams_2) in enumerate(df_2[['teams','date']].values):
if difflib.SequenceMatcher(None,teams_1,teams_2).ratio() > .8:
df_2.ix[ixb,'merge_teams'] = teams_1 # creates a merge key in df_2
if difflib.SequenceMatcher(None,date_1, date_2).ratio() > .8:
df_2.ix[ixb,'merge_date'] = date_1 # creates a merge key in df_2
# This should rturn all rows where teams,date and league all match by over 80%
# This is just for teams and date, I want to include league as well
Буду признателен за любые советы или рекомендации.