Поиск Расстояние Левенштейна - измеряет «сходство текста».
Источник Левенштейн-Реализация: https://en.wikibooks.org/wiki/Algorithm_Implementation
def levenshtein(s1, s2):
# source: https://en.wikibooks.org/wiki/Algorithm_Implementation
# /Strings/Levenshtein_distance#Python
if len(s1) < len(s2):
return levenshtein(s2, s1)
# len(s1) >= len(s2)
if len(s2) == 0:
return len(s1)
previous_row = range(len(s2) + 1)
for i, c1 in enumerate(s1):
current_row = [i + 1]
for j, c2 in enumerate(s2):
insertions = previous_row[j + 1] + 1
deletions = current_row[j] + 1
substitutions = previous_row[j] + (c1 != c2)
current_row.append( min(insertions, deletions, substitutions))
previous_row = current_row
return previous_row[-1]
Применительно к вашей проблеме:
skus = ["235 DSKTP 10LB","222840 MSE 2oz"]
full = ["Desktop", "Mouse", "potkseD"]
# go over all skus
for sku in skus:
name = sku.split()[1].lower() # extract name
dist = []
for f in full: # calculate all levenshtein dists to full names
# you could shorten this by only using those
# where 1st character is identicall
dist.append( ( levenshtein(name.lower(),f.lower()),name,f) )
print(dist)
# get the minimal distance (beware if same distances occure)
print( min( (p for p in dist), key = lambda x:x[0]) )
Выход:
# distances
[(2, 'dsktp', 'Desktop'), (5, 'dsktp', 'Mouse'), (6, 'dsktp', 'potkseD')]
# minimal one
(2, 'dsktp', 'Desktop')
# distances
[(6, 'mse', 'Desktop'), (2, 'mse', 'Mouse'), (5, 'mse', 'potkseD')]
# minimal one
(2, 'mse', 'Mouse')
Если у вас есть фиксированное сопоставление, сядьте и создайте словарь сопоставления вручную один раз, и вы будете в восторге, пока не появится новый skus.