Я изменил VBA-функцию расстояния Левенштейна, найденную на этой записи , чтобы использовать одномерный массив.Это работает намного быстрее.
'Calculate the Levenshtein Distance between two strings (the number of insertions,
'deletions, and substitutions needed to transform the first string into the second)
Public Function LevenshteinDistance2(ByRef s1 As String, ByRef s2 As String) As Long
Dim L1 As Long, L2 As Long, D() As Long, LD As Long 'Length of input strings and distance matrix
Dim i As Long, j As Long, ss2 As Long, ssL As Long, cost As Long 'loop counters, loop step, loop start, and cost of substitution for current letter
Dim cI As Long, cD As Long, cS As Long 'cost of next Insertion, Deletion and Substitution
Dim L1p1 As Long, L1p2 As Long 'Length of S1 + 1, Length of S1 + 2
L1 = Len(s1): L2 = Len(s2)
L1p1 = L1 + 1
L1p2 = L1 + 2
LD = (((L1 + 1) * (L2 + 1))) - 1
ReDim D(0 To LD)
ss2 = L1 + 1
For i = 0 To L1 Step 1: D(i) = i: Next i 'setup array positions 0,1,2,3,4,...
For j = 0 To LD Step ss2: D(j) = j / ss2: Next j 'setup array positions 0,1,2,3,4,...
For j = 1 To L2
ssL = (L1 + 1) * j
For i = (ssL + 1) To (ssL + L1)
If Mid$(s1, i Mod ssL, 1) <> Mid$(s2, j, 1) Then cost = 1 Else cost = 0
cI = D(i - 1) + 1
cD = D(i - L1p1) + 1
cS = D(i - L1p2) + cost
If cI <= cD Then 'Insertion or Substitution
If cI <= cS Then D(i) = cI Else D(i) = cS
Else 'Deletion or Substitution
If cD <= cS Then D(i) = cD Else D(i) = cS
End If
Next i
Next j
LevenshteinDistance2 = D(LD)
End Function
Я проверил эту функцию со строкой 's1' длиной 11,304 и 's2' длиной 5665 (> 64 миллиона сравнений символов).При использовании вышеуказанной версии функции с одним измерением время выполнения на моей машине составляет ~ 24 секунды.Исходная двумерная функция, на которую я ссылался в приведенной выше ссылке, требует ~ 37 секунд для тех же строк.Я дополнительно оптимизировал одномерную функцию, как показано ниже, и для тех же строк требуется ~ 10 секунд.
'Calculate the Levenshtein Distance between two strings (the number of insertions,
'deletions, and substitutions needed to transform the first string into the second)
Public Function LevenshteinDistance(ByRef s1 As String, ByRef s2 As String) As Long
Dim L1 As Long, L2 As Long, D() As Long, LD As Long 'Length of input strings and distance matrix
Dim i As Long, j As Long, ss2 As Long 'loop counters, loop step
Dim ssL As Long, cost As Long 'loop start, and cost of substitution for current letter
Dim cI As Long, cD As Long, cS As Long 'cost of next Insertion, Deletion and Substitution
Dim L1p1 As Long, L1p2 As Long 'Length of S1 + 1, Length of S1 + 2
Dim sss1() As String, sss2() As String 'Character arrays for string S1 & S2
L1 = Len(s1): L2 = Len(s2)
L1p1 = L1 + 1
L1p2 = L1 + 2
LD = (((L1 + 1) * (L2 + 1))) - 1
ReDim D(0 To LD)
ss2 = L1 + 1
For i = 0 To L1 Step 1: D(i) = i: Next i 'setup array positions 0,1,2,3,4,...
For j = 0 To LD Step ss2: D(j) = j / ss2: Next j 'setup array positions 0,1,2,3,4,...
ReDim sss1(1 To L1) 'Size character array S1
ReDim sss2(1 To L2) 'Size character array S2
For i = 1 To L1 Step 1: sss1(i) = Mid$(s1, i, 1): Next i 'Fill S1 character array
For i = 1 To L2 Step 1: sss2(i) = Mid$(s2, i, 1): Next i 'Fill S2 character array
For j = 1 To L2
ssL = (L1 + 1) * j
For i = (ssL + 1) To (ssL + L1)
If sss1(i Mod ssL) <> sss2(j) Then cost = 1 Else cost = 0
cI = D(i - 1) + 1
cD = D(i - L1p1) + 1
cS = D(i - L1p2) + cost
If cI <= cD Then 'Insertion or Substitution
If cI <= cS Then D(i) = cI Else D(i) = cS
Else 'Deletion or Substitution
If cD <= cS Then D(i) = cD Else D(i) = cS
End If
Next i
Next j
LevenshteinDistance = D(LD)
End Function