Как я могу получить точное общее значение "max.distance" для сопоставления нечеткой строки, используя agrep? - PullRequest
0 голосов
/ 11 сентября 2018

Я пытаюсь определить наилучшую точность для нечеткого совпадения строк между двумя именами строк, используя agrep.

Однако мне нужно будет выбрать одну точность "max.distance", чтобы применить ее ко всем строкам, которые я пытаюсь сопоставить, так как количество строк огромно. Невозможно выбрать наилучшее значение точности "max.distance" для каждой строки, которую я пытаюсь сопоставить.

Например, допустим, я использую точность «max.distance» как «0,2», «0,1» и «0,05» для каждого «BANK OF AMERICA CORP» и «1st Capital Bank».

Во-первых, ниже приведено значение для «BANK OF AMERICA CORP» для «max.distance», равного «0,2», «0,1» и «0,05»:

    > agrep("BANK OF AMERICA CORP",C1999_0[,2],ignore.case = TRUE, value = TRUE,fixed = TRUE,max.distance =0.2)
     [1] "BANK OF AMERICA/PRIVATE BANK WEST"   "BANK OF AMERICA SECURITIES"         
     [3] "BANK OF AMERICA SEC LLC"             "BANK OF AMERICA SECURITIES LLC"     
     [5] "BANK OF AMERICA NT & SA"             "BANK OF AMERICA CORP"               
     [7] "ALLIANZ OF AMERICA CORP"             "Bank of America Securities/Vice Pre"
     [9] "Bank of America Securities/Investme" "Bank of America/President"          
    [11] "Bank of America Securities LLC/Prin" "Bank of America Securities LLC/Mana"
    [13] "Bank of America Securities LLC/Inve" "Bank of America Securities/Principa"
    [15] "Bank of America Securities LLC/Bank" "Bank of America Sec/Investment Bank"
    [17] "Bank Of America Securities/Managing" "Bank of America/Chairman--Midwest A"
    [19] "Bank of America Securities LLC/Vice" "Bank of America Corporation/Sales C"
    [21] "Bank of America Securities/Broker"   "Bank of America Corporation/Banker" 
    [23] "Bank of America Corporation/Senior"  "Bank of America Securities/Equity R"
    [25] "Bank of America Corporation/Vice Ch" "BANK OF AMERICA CORPORATION"        
    [27] "BANK OF AMERICA HEADQUARTERS"        "BANK OF AMERICA ADMINISTRATION"     
    [29] "BANK OF AMERICA N A"                 "Bank of America/Commercial Banking" 
    [31] "Bank of America Sec./Investment Ban"
    > 
    > agrep("BANK OF AMERICA CORP",C1999_0[,2],ignore.case = TRUE, value = TRUE,fixed = TRUE,max.distance =0.1)
    [1] "BANK OF AMERICA CORP"                "ALLIANZ OF AMERICA CORP"            
    [3] "Bank of America Corporation/Sales C" "Bank of America Corporation/Banker" 
    [5] "Bank of America Corporation/Senior"  "Bank of America Corporation/Vice Ch"
    [7] "BANK OF AMERICA CORPORATION"        
    > 
    > agrep("BANK OF AMERICA CORP",C1999_0[,2],ignore.case = TRUE, value = TRUE,fixed = TRUE,max.distance =0.05)
    [1] "BANK OF AMERICA CORP"                "Bank of America Corporation/Sales C"
    [3] "Bank of America Corporation/Banker"  "Bank of America Corporation/Senior" 
    [5] "Bank of America Corporation/Vice Ch" "BANK OF AMERICA CORPORATION"        

Затем ниже «1-й банк капитала» для «max.distance», равный «0,2», «0,1» и «0,05»:

    > agrep("1st Capital Bank",C1999_0[,2],ignore.case = TRUE, value = TRUE,fixed = TRUE,max.distance =0.2)
      [1] "HURST CAPITAL PARTNERS"             
      [2] "SOY CAPITAL BANK"                   
      [3] "FIRST CAPITOL BANK OF VICTOR"       
      [4] "OSTERWEIS CAPITAL MANAGEMENT"       
      [5] "1ST NATIONAL BANK"                  
      [6] "FIRST CAPITAL BANK"                 
      [7] "SEATTLE 1ST NAT'L BANK"             
      [8] "FIELD POINT CAPITAL MANAGEMENT"     
      [9] "SUMMERSET CAPITAL MANAGEMENT"       
     [10] "AMERIQUEST CAPITAL ASSOC"           
     [11] "BB&T CAPITAL MARKETS"               
     [12] "HUGHES CAPITAL MANAGEMENT"          
     [13] "WELLS CAPITAL MANAGEMENT"           
     [14] "SUPERIOR ST CAPITAL ADVISORS"       
     [15] "ORMES CAPITAL MARKETS INC"          
     [16] "1ST NAT'L BANK OF IL"               
     [17] "ADVENT CAPITAL MANAGEMENT"          
     [18] "1ST CAPITOL BANK"                   
     [19] "BIONDI REISS CAPITAL MANAGEMENT"    
     [20] "CCYBYS CAPITAL MARKETS"             
     [21] "SEACOAST CAPITAL PARTNERS"          
     [22] "DOUGLAS CAPITAL MANAGEMENT"         
     [23] "HIGHFIELDS CAPITAL MANAGEMENT"      
     [24] "PRECEPT CAPITAL MANAGEMENT LP"      
     [25] "AUGUST CAPITAL MANAGEMENT"          
     [26] "SAKSA CAPITAL MANAGEMENT"           
     [27] "IMS CAPITAL MANAGEMENT"             
     [28] "TRENT CAPITAL MANAGEMENT"           
     [29] "Ormes Capital Management"           
     [30] "GARNET CAPITAL MANAGEMENT LLC"      
     [31] "INTERFASE CAPITAL MANAGERS"         
     [32] "RJS CAPITAL MANAGEMENT INC"         
     [33] "1ST NATIONAL BANK OF DE KALB"       
     [34] "1ST NAT'L BANK OF PHILLIPS CO"      
     [35] "1ST NAT'L BANK OF OKLAHOMA"         
     [36] "PROGRESS CAPITAL MANAGEMENT INC"    
     [37] "CAPITAL BANK & TRUST"               
     [38] "1ST NATL BANK"                      
     [39] "ASB Capital Management/Real Estate" 
     [40] "Sears Capital Management"           
     [41] "Osterweis Capital Management/Invest"
     [42] "Cerberus Capital Management LP/Asse"
     [43] "LVS Capital Management/President"   
     [44] "1st Central Bank/Banker"            
     [45] "Summit Capital Management"          
     [46] "Orwes Capital Markets/Stockbroker"  
     [47] "Ormes Capital Management/Investment"
     [48] "Nevis Capital Management/Investment"
     [49] "Duncan Hurst Capital Management"    
     [50] "Progress Capital Management/Preside"
     [51] "Cerberus Capital Management LP"     
     [52] "Wit Capital/Banker"                 
     [53] "Ormes Capital Markets Inc."         
     [54] "Ormes Capital Markets/President & C"
     [55] "Berents & Hess Capital Management"  
     [56] "Progress Capital Management/Venture"
     [57] "First Capital Bank of KY"           
     [58] "Foothill Capital/Banker"            
     [59] "Pequot Capital Management/Equity Re"
     [60] "First Dominion Capital/Banking"     
     [61] "Greenwhich Capital/Banker"          
     [62] "Veritas Capital Management/Banker"  
     [63] "Veritas Capital Management/Investme"
     [64] "Lesese Capital Management/Investmen"
     [65] "Douglas Capital Management/Investme"
     [66] "FIRST NATINAL BANK OF AMARILLO"     
     [67] "NEVIS CAPITAL MANAGEMENT"           
     [68] "VERITAS CAPITAL MANAGEMENT"         
     [69] "SIEBERT CAPITAL MARKETS"            
     [70] "HOURGLASS CAPITAL MANAGEMENT"       
     [71] "1ST NATIONAL BANK DALHART"          
     [72] "TEXAS CAPITAL BANK"                 
     [73] "NICHOLAS CAPITAL MANAGEMENT"        
     [74] "CERBUS CAPITAL MANAGEMENT"          
     [75] "CROESUS CAPITAL MANAGEMENT"         
     [76] "EAST WEST CAPITAL ASSOCIATES INC"   
     [77] "PRENDERGAST CAPITAL MANAGEMENT"     
     [78] "NANTUCKET CAPITAL MANAGEMENT"       
     [79] "1ST NATIONAL BANK TEMPLE"           
     [80] "ENTRUST CAPITAL INC"                
     [81] "1ST NATIONAL BANK OF IL"            
     [82] "SIMMS CAPITAL MANAGEMENT"           
     [83] "FIRST CAPITAL ADVISORS"             
     [84] "FIRST CAPITAL MANAGEMENT LTD"       
     [85] "1ST NATIONAL BANK & TRUST"          
     [86] "PENTECOST CAPITAL MANAGEMENT INC"   
     [87] "EAST-WEST CAPITAL ASSOCIATES"       
     [88] "1ST NAT'L BANK OF JOLIET"           
     [89] "FIRST CAPITOL BANK OF VICTO"        
     [90] "FIRST CAPITAL FINANCIAL"            
     [91] "PACIFIC COAST CAPITAL PARTNERS"     
     [92] "FIRST CAPITOL BANK"                 
     [93] "FIRST CAPITAL ENGINEERING"          
     [94] "MIDWEST CAPITOL MANAGEMENT"         
     [95] "PEQUOT CAPITAL MANAGEMENT"          
     [96] "AGGOTT CAPITAL MANAGEMENT"          
     [97] "SIMMS CAPITAL MANAGEMENT INC"       
     [98] "PHILLIPS CAPITAL MANAGEMENT LLC"    
     [99] "1ST NATIONAL BANK OF COLD SP"       
    [100] "SOY CAPITOL BANK"                   
    > 
    > agrep("1st Capital Bank",C1999_0[,2],ignore.case = TRUE, value = TRUE,fixed = TRUE,max.distance =0.1)
    [1] "FIRST CAPITOL BANK OF VICTOR" "FIRST CAPITAL BANK"          
    [3] "1ST CAPITOL BANK"             "First Capital Bank of KY"    
    [5] "TEXAS CAPITAL BANK"           "FIRST CAPITOL BANK OF VICTO" 
    [7] "FIRST CAPITOL BANK"          
    > 
    > agrep("1st Capital Bank",C1999_0[,2],ignore.case = TRUE, value = TRUE,fixed = TRUE,max.distance =0.05)
    [1] "FIRST CAPITAL BANK"       "1ST CAPITOL BANK"        
    [3] "First Capital Bank of KY"

Как видите, действительно трудно найти значение общей точности для "max.distance", которое будет применяться к каждой строке, например, "BANK OF AMERICA CORP" и "1st Capital Bank". Кроме того, у меня гораздо больше фирменных наименований, кроме этих двух, поэтому я испытываю трудности с поиском значения общей точности и команды для сопоставления нечеткой строки.

Исходный файл данных для C1999_0 слишком велик, чтобы его можно было прикрепить, поэтому я думаю, что для его репликации будет достаточно использовать только выходные значения, как показано выше.

Я знаю, что есть несколько подкатегорий для манипулирования, таких как стоимость, замены, вставка и т. Д., Но они не имели большого значения только при изменении самого значения "max.distance".

Буду очень признателен, если смогу помочь с этим!

Ответы [ 2 ]

0 голосов
/ 26 сентября 2018

Кажется возможным, что это неразрешимая проблема, как указано, что не существует ни одного max.distance, которое будет хорошо работать для всех входных строк.

Возможно, стоит попробовать метод, подобный tf-idf , чтобы определить необычность ваших строк и масштабировать максимальное расстояние до этого. Таким образом, «Ziggurat Mutual» может быть предоставлено больше возможностей для вариаций, чем «First Bank National», который является более общим.

Вы можете также рассмотреть возможность использования пакета fuzzyjoin, который предлагает несколько быстрых способов попробовать разные варианты. Например, вы можете попробовать:

df <- c("HURST CAPITAL PARTNERS", "SOY CAPITAL BANK", "FIRST CAPITOL BANK OF VICTOR", "OSTERWEIS CAPITAL MANAGEMENT", "1ST NATIONAL BANK", "FIRST CAPITAL BANK", "SEATTLE 1ST NAT'L BANK", "FIELD POINT CAPITAL MANAGEMENT", "SUMMERSET CAPITAL MANAGEMENT", "AMERIQUEST CAPITAL ASSOC", "BB&T CAPITAL MARKETS", "HUGHES CAPITAL MANAGEMENT", "WELLS CAPITAL MANAGEMENT", "SUPERIOR ST CAPITAL ADVISORS", "ORMES CAPITAL MARKETS INC", "1ST NAT'L BANK OF IL", "ADVENT CAPITAL MANAGEMENT", "1ST CAPITOL BANK", "BIONDI REISS CAPITAL MANAGEMENT", "CCYBYS CAPITAL MARKETS", "SEACOAST CAPITAL PARTNERS", "DOUGLAS CAPITAL MANAGEMENT", "HIGHFIELDS CAPITAL MANAGEMENT", "PRECEPT CAPITAL MANAGEMENT LP", "AUGUST CAPITAL MANAGEMENT", "SAKSA CAPITAL MANAGEMENT", "IMS CAPITAL MANAGEMENT", "TRENT CAPITAL MANAGEMENT", "Ormes Capital Management", "GARNET CAPITAL MANAGEMENT LLC", "INTERFASE CAPITAL MANAGERS", "RJS CAPITAL MANAGEMENT INC", "1ST NATIONAL BANK OF DE KALB", "1ST NAT'L BANK OF PHILLIPS CO", "1ST NAT'L BANK OF OKLAHOMA", "PROGRESS CAPITAL MANAGEMENT INC", "CAPITAL BANK & TRUST", "1ST NATL BANK", "ASB Capital Management/Real Estate", "Sears Capital Management", "Osterweis Capital Management/Invest", "Cerberus Capital Management LP/Asse", "LVS Capital Management/President", "1st Central Bank/Banker", "Summit Capital Management", "Orwes Capital Markets/Stockbroker", "Ormes Capital Management/Investment", "Nevis Capital Management/Investment", "Duncan Hurst Capital Management", "Progress Capital Management/Preside", "Cerberus Capital Management LP", "Wit Capital/Banker", "Ormes Capital Markets Inc.", "Ormes Capital Markets/President & C", "Berents & Hess Capital Management", "Progress Capital Management/Venture", "First Capital Bank of KY", "Foothill Capital/Banker", "Pequot Capital Management/Equity Re", "First Dominion Capital/Banking", "Greenwhich Capital/Banker", "Veritas Capital Management/Banker", "Veritas Capital Management/Investme", "Lesese Capital Management/Investmen", "Douglas Capital Management/Investme", "FIRST NATINAL BANK OF AMARILLO", "NEVIS CAPITAL MANAGEMENT", "VERITAS CAPITAL MANAGEMENT", "SIEBERT CAPITAL MARKETS", "HOURGLASS CAPITAL MANAGEMENT", "1ST NATIONAL BANK DALHART", "TEXAS CAPITAL BANK", "NICHOLAS CAPITAL MANAGEMENT", "CERBUS CAPITAL MANAGEMENT", "CROESUS CAPITAL MANAGEMENT", "EAST WEST CAPITAL ASSOCIATES INC", "PRENDERGAST CAPITAL MANAGEMENT", "NANTUCKET CAPITAL MANAGEMENT", "1ST NATIONAL BANK TEMPLE", "ENTRUST CAPITAL INC", "1ST NATIONAL BANK OF IL", "SIMMS CAPITAL MANAGEMENT", "FIRST CAPITAL ADVISORS", "FIRST CAPITAL MANAGEMENT LTD", "1ST NATIONAL BANK & TRUST", "PENTECOST CAPITAL MANAGEMENT INC", "EAST-WEST CAPITAL ASSOCIATES", "1ST NAT'L BANK OF JOLIET", "FIRST CAPITOL BANK OF VICTO", "FIRST CAPITAL FINANCIAL", "PACIFIC COAST CAPITAL PARTNERS", "FIRST CAPITOL BANK", "FIRST CAPITAL ENGINEERING", "MIDWEST CAPITOL MANAGEMENT", "PEQUOT CAPITAL MANAGEMENT", "AGGOTT CAPITAL MANAGEMENT", "SIMMS CAPITAL MANAGEMENT INC", "PHILLIPS CAPITAL MANAGEMENT LLC", "1ST NATIONAL BANK OF COLD SP", "SOY CAPITOL BANK")

library(dplyr); library(fuzzyjoin)
df <- df %>% as_data_frame()

df %>%
  # Allowable methods include osa, lv, dl, hamming, lcs, qgram, 
  #    cosine, jaccard, jw, soundex
  fuzzyjoin::stringdist_inner_join(df, method = "lv", distance_col = "distance", max_dist = 4) %>%
  filter(distance > 0)

Joining by: "value"
# A tibble: 70 x 3
   value.x                      value.y                     distance
   <chr>                        <chr>                          <dbl>
 1 SOY CAPITAL BANK             1ST CAPITOL BANK                   4
 2 SOY CAPITAL BANK             SOY CAPITOL BANK                   1
 3 FIRST CAPITOL BANK OF VICTOR FIRST CAPITOL BANK OF VICTO        1
 4 1ST NATIONAL BANK            1ST NATL BANK                      4
 5 FIRST CAPITAL BANK           1ST CAPITOL BANK                   4
 6 FIRST CAPITAL BANK           FIRST CAPITOL BANK                 1
 7 HUGHES CAPITAL MANAGEMENT    DOUGLAS CAPITAL MANAGEMENT         4
 8 HUGHES CAPITAL MANAGEMENT    AUGUST CAPITAL MANAGEMENT          4
 9 WELLS CAPITAL MANAGEMENT     IMS CAPITAL MANAGEMENT             4
10 WELLS CAPITAL MANAGEMENT     NEVIS CAPITAL MANAGEMENT           3

... чтобы поэкспериментировать с потенциальными неточными совпадениями в вашем списке примеров.

0 голосов
/ 22 сентября 2018

Проблема с agrep заключается в том, что это похоже на grep, как описано в help("grep")

Поскольку кто-то, кто небрежно прочитал описание, даже подал отчет об ошибке, обратите внимание, чтоэто соответствует подстроке каждого элемента x (так же, как grep), а не целым элементам.См. Также adist в пакете utils , который при необходимости возвращает смещения совпадающих подстрок.

Кажется, это проблема в вашем последнем примере, поскольку у вас много именкоторый содержит «Капитал» или «Банк» или оба.То, что я хотел бы сделать, это использовать для вычисления расстояние Левенштейна (это то, что делает agrep или обобщенная версия и только для подстрок ) и взять те, которые имеют наименьшее расстояние.Например,

C1999 <- c("HURST CAPITAL PARTNERS", "SOY CAPITAL BANK", "FIRST CAPITOL BANK OF VICTOR", "OSTERWEIS CAPITAL MANAGEMENT", "1ST NATIONAL BANK", "FIRST CAPITAL BANK", "SEATTLE 1ST NAT'L BANK", "FIELD POINT CAPITAL MANAGEMENT", "SUMMERSET CAPITAL MANAGEMENT", "AMERIQUEST CAPITAL ASSOC", "BB&T CAPITAL MARKETS", "HUGHES CAPITAL MANAGEMENT", "WELLS CAPITAL MANAGEMENT", "SUPERIOR ST CAPITAL ADVISORS", "ORMES CAPITAL MARKETS INC", "1ST NAT'L BANK OF IL", "ADVENT CAPITAL MANAGEMENT", "1ST CAPITOL BANK", "BIONDI REISS CAPITAL MANAGEMENT", "CCYBYS CAPITAL MARKETS", "SEACOAST CAPITAL PARTNERS", "DOUGLAS CAPITAL MANAGEMENT", "HIGHFIELDS CAPITAL MANAGEMENT", "PRECEPT CAPITAL MANAGEMENT LP", "AUGUST CAPITAL MANAGEMENT", "SAKSA CAPITAL MANAGEMENT", "IMS CAPITAL MANAGEMENT", "TRENT CAPITAL MANAGEMENT", "Ormes Capital Management", "GARNET CAPITAL MANAGEMENT LLC", "INTERFASE CAPITAL MANAGERS", "RJS CAPITAL MANAGEMENT INC", "1ST NATIONAL BANK OF DE KALB", "1ST NAT'L BANK OF PHILLIPS CO", "1ST NAT'L BANK OF OKLAHOMA", "PROGRESS CAPITAL MANAGEMENT INC", "CAPITAL BANK & TRUST", "1ST NATL BANK", "ASB Capital Management/Real Estate", "Sears Capital Management", "Osterweis Capital Management/Invest", "Cerberus Capital Management LP/Asse", "LVS Capital Management/President", "1st Central Bank/Banker", "Summit Capital Management", "Orwes Capital Markets/Stockbroker", "Ormes Capital Management/Investment", "Nevis Capital Management/Investment", "Duncan Hurst Capital Management", "Progress Capital Management/Preside", "Cerberus Capital Management LP", "Wit Capital/Banker", "Ormes Capital Markets Inc.", "Ormes Capital Markets/President & C", "Berents & Hess Capital Management", "Progress Capital Management/Venture", "First Capital Bank of KY", "Foothill Capital/Banker", "Pequot Capital Management/Equity Re", "First Dominion Capital/Banking", "Greenwhich Capital/Banker", "Veritas Capital Management/Banker", "Veritas Capital Management/Investme", "Lesese Capital Management/Investmen", "Douglas Capital Management/Investme", "FIRST NATINAL BANK OF AMARILLO", "NEVIS CAPITAL MANAGEMENT", "VERITAS CAPITAL MANAGEMENT", "SIEBERT CAPITAL MARKETS", "HOURGLASS CAPITAL MANAGEMENT", "1ST NATIONAL BANK DALHART", "TEXAS CAPITAL BANK", "NICHOLAS CAPITAL MANAGEMENT", "CERBUS CAPITAL MANAGEMENT", "CROESUS CAPITAL MANAGEMENT", "EAST WEST CAPITAL ASSOCIATES INC", "PRENDERGAST CAPITAL MANAGEMENT", "NANTUCKET CAPITAL MANAGEMENT", "1ST NATIONAL BANK TEMPLE", "ENTRUST CAPITAL INC", "1ST NATIONAL BANK OF IL", "SIMMS CAPITAL MANAGEMENT", "FIRST CAPITAL ADVISORS", "FIRST CAPITAL MANAGEMENT LTD", "1ST NATIONAL BANK & TRUST", "PENTECOST CAPITAL MANAGEMENT INC", "EAST-WEST CAPITAL ASSOCIATES", "1ST NAT'L BANK OF JOLIET", "FIRST CAPITOL BANK OF VICTO", "FIRST CAPITAL FINANCIAL", "PACIFIC COAST CAPITAL PARTNERS", "FIRST CAPITOL BANK", "FIRST CAPITAL ENGINEERING", "MIDWEST CAPITOL MANAGEMENT", "PEQUOT CAPITAL MANAGEMENT", "AGGOTT CAPITAL MANAGEMENT", "SIMMS CAPITAL MANAGEMENT INC", "PHILLIPS CAPITAL MANAGEMENT LLC", "1ST NATIONAL BANK OF COLD SP", "SOY CAPITOL BANK")

func <- function(x, y, tol = 0L){
  require(stringdist)
  dista <- stringdist::stringdist(x, y, method = "lv")
  min_dista <- min(dista)
  y[dista <= min_dista + tol]
}
func("1st Capital Bank", C1999)
#R [1] "Wit Capital/Banker"
func("1st Capital Bank", C1999, 4L)
#R [1] "Wit Capital/Banker"       "First Capital Bank of KY"
func("1st Capital Bank", C1999, 10L)
#R  [1] "SOY CAPITAL BANK"           "1ST NATIONAL BANK"         
#R  [3] "FIRST CAPITAL BANK"         "1ST CAPITOL BANK"          
#R  [5] "Ormes Capital Management"   "1ST NATL BANK"             
#R  [7] "Sears Capital Management"   "1st Central Bank/Banker"   
#R  [9] "Summit Capital Management"  "Wit Capital/Banker"        
#R [11] "Ormes Capital Markets Inc." "First Capital Bank of KY"  
#R [13] "Foothill Capital/Banker"    "Greenwhich Capital/Banker" 
#R [15] "TEXAS CAPITAL BANK"         "FIRST CAPITOL BANK"        
#R [17] "SOY CAPITOL BANK" 

# ignoring cases
func <- function(x, y, tol = 0L){
  require(stringdist)
  dista <- stringdist::stringdist(tolower(x), tolower(y), method = "lv")
  min_dista <- min(dista)
  y[dista <= min_dista + tol]
}
func("1st Capital Bank", C1999, 0L)
#R [1] "1ST CAPITOL BANK"

Параметр tol в func контролирует, если вы хотите включить примеры, которые находятся на tol дальше от минимального расстояния Левенштейна.Я вижу, что я не ответил точно, что вы просили ( Как я могу получить точное общее значение "max.distance" для сопоставления нечеткой строки, используя agrep? ), но я думаю, что мой ответ может быть такимВы ищете.

Я использую stringdist::stringdist вместо adist, так как первый, кажется, быстрее.Это все еще может быть немного медленным, и я хочу, чтобы там был пакет R, где вы могли бы установить максимальное расстояние, но я не встречал такого пакета.Это может значительно ускорить вычисление (затем ограниченное) расстояния Левенштейна.

...