Ошибка: невозможно преобразовать логический вектор в функцию при использовании нечеткого слияния для текстовых строк в R - PullRequest
0 голосов
/ 17 октября 2019
    df1=structure(list(NAME = structure(c(8L, 1L, 14L, 10L, 16L, 15L, 
9L, 12L, 13L, 11L, 5L, 3L, 2L, 17L, 7L, 4L, 6L), .Label = c("2 passes of preliminary light signaling at night", 
"AT due to non-confirmation by the driver of operability at the request of TSKBM", 
"AT due to the lack of confirmation of vigilance when following a prohibiting signal without passage", 
"AT in case of violation of the sequence of traffic lights with a prohibition indication with CLUB-U (UP), BLOCK in / and", 
"AT in situations not specified in the classifier", "AT in the absence of registration of pressing RB, RBS", 
"AT in the parking lot", "Incorrect actions when a malfunction is detected - \"push\" \"on the way", 
"Incorrect input of train characteristics in efficiency, KLUB-U (UP), BLOCK at / and the driver at the beginning of the trip", 
"Reversing with a completely unused shunting route", "Setting the toggle switch DZ to the position WITHOUT ALSN in the coded area or untimely setting in ALS", 
"Stop on limiting lift with critical weight without the need for an auxiliary locomotive", 
"Stops of trains on a stage with an enable signal", "The inclusion of the Saut-Ts (TsM) KIO-Saut at a pressure in the TM, other than the charging", 
"Violation of driving conditions for trains with increased weight and increased length", 
"Violation of the algorithm for a single or periodic check of vigilance SAUT-Ts (TsM), KIO-SAUT", 
"АТ due to excess of speed controlled by a safety device when following a prohibiting signal without passage"
), class = "factor")), .Names = "NAME", class = "data.frame", row.names = c(NA, 
-17L))

здесь

df2=structure(list(NAME = structure(c(9L, 2L, 15L, 11L, 16L, 1L, 
10L, 13L, 14L, 12L, 6L, 4L, 3L, 17L, 8L, 5L, 7L), .Label = c(" of driving conditions for trains with increased weight and increased length", 
"2 passes of preliminary light signaling at night", "AT due to non-confirmation by the driver of operability at the request of TSKBM", 
"AT due to the lack of confirmation of vigilance when following a prohibiting signal without passage", 
"AT in case of violation of the sequence of traffic lights with a prohibition indication with CLUB-U (UP), BLOCK in / and something more", 
"AT in situations not specified in the classifier", "AT in the absence of registration of pressing RB, RBS TGV", 
"AT in the parking lot", "Incorrect actions when a malfunction is detected - \"push\" \"on the way --", 
"Incorrect input of train characteristics in efficiency, KLUB-U (UP), BLOCK at / and the driver at the beginning of the trip", 
"Reversing with a completely unused shunting route", "Setting the toggle switch DZ to the position WITHOUT ALSN in the coded area or untimely setting in ALS  400-500", 
"Stop on limiting lift with critical weight without the need for an auxiliary locomotive", 
"Stops of trains where  on a stage with an enable signal was error", 
"The inclusion of the Saut-Ts (TsM) KIO-Saut at a pressure in the TM, other than the charging", 
"Violation of the algorithm for a single or periodic check of vigilance SAUT-Ts (TsM), KIO-SAUT", 
"АТ due to excess of speed controlled by a safety  1000 device when following a prohibiting signal without passage"
), class = "factor"), idspnar = 1:17), .Names = c("NAME", "idspnar"
), class = "data.frame", row.names = c(NA, -17L))

я хочу соединить эти данные текстовыми строками (только внутреннее объединение)

library(fuzzyjoin)
rpairs_jar <- fuzzy_inner_join(df1, df2,
                              by = c("NAME"),match_fun=TRUE)

мне не нужна простая функция слияния, мне нужно fuzzyjoin, потому чтотексты могут немного отличаться, где-то похожий текст, но добавлено несколько цифр или где-то знаки препинания. Конечно, совершенно разные тексты cat и dogs не должны объединяться :) и выдается ошибка

Error: Can't convert a logical vector to function

, как соединить эти два набора данных текстовыми строками желаемый вывод

                                 NAME
1                                                        Incorrect actions when a malfunction is detected - "push" "on the way
2                                                                             2 passes of preliminary light signaling at night
3                                 The inclusion of the Saut-Ts (TsM) KIO-Saut at a pressure in the TM, other than the charging
4                                                                            Reversing with a completely unused shunting route
5                               Violation of the algorithm for a single or periodic check of vigilance SAUT-Ts (TsM), KIO-SAUT
6                                        Violation of driving conditions for trains with increased weight and increased length
7  Incorrect input of train characteristics in efficiency, KLUB-U (UP), BLOCK at / and the driver at the beginning of the trip
8                                      Stop on limiting lift with critical weight without the need for an auxiliary locomotive
9                                                                             Stops of trains on a stage with an enable signal
10                      Setting the toggle switch DZ to the position WITHOUT ALSN in the coded area or untimely setting in ALS
11                                                                            AT in situations not specified in the classifier
12                         AT due to the lack of confirmation of vigilance when following a prohibiting signal without passage
13                                             AT due to non-confirmation by the driver of operability at the request of TSKBM
14                 АТ due to excess of speed controlled by a safety device when following a prohibiting signal without passage
15                                                                                                       AT in the parking lot
16    AT in case of violation of the sequence of traffic lights with a prohibition indication with CLUB-U (UP), BLOCK in / and
17                                                                       AT in the absence of registration of pressing RB, RBS
                                                                                                                                     NAME.1
1                                                                  Incorrect actions when a malfunction is detected - "push" "on the way --
2                                                                                          2 passes of preliminary light signaling at night
3                                              The inclusion of the Saut-Ts (TsM) KIO-Saut at a pressure in the TM, other than the charging
4                                                                                         Reversing with a completely unused shunting route
5                                            Violation of the algorithm for a single or periodic check of vigilance SAUT-Ts (TsM), KIO-SAUT
6                                                               of driving conditions for trains with increased weight and increased length
7               Incorrect input of train characteristics in efficiency, KLUB-U (UP), BLOCK at / and the driver at the beginning of the trip
8                                                   Stop on limiting lift with critical weight without the need for an auxiliary locomotive
9                                                                         Stops of trains where  on a stage with an enable signal was error
10                          Setting the toggle switch DZ to the position WITHOUT ALSN in the coded area or untimely setting in ALS  400-500
11                                                                                         AT in situations not specified in the classifier
12                                      AT due to the lack of confirmation of vigilance when following a prohibiting signal without passage
13                                                          AT due to non-confirmation by the driver of operability at the request of TSKBM
14                        АТ due to excess of speed controlled by a safety  1000 device when following a prohibiting signal without passage
15                                                                                                                    AT in the parking lot
16  AT in case of violation of the sequence of traffic lights with a prohibition indication with CLUB-U (UP), BLOCK in / and something more
17                                                                                AT in the absence of registration of pressing RB, RBS TGV
   idspnar
1        1
2        2
3        3
4        4
5        5
6        6
7        7
8        8
9        9
10      10
11      11
12      12
13      13
14      14
15      15
16      16
17      17

Как я могу исправить объединение двух наборов данных? наборы данных могут быть разного размера, в одном может быть 20 журналов, а во втором - 10000 строк, поэтому мне нужно нечеткое внутреннее соединение для семантического соединения. библиотеки нечеткого соединения являются самыми простыми, но если вы думаете, что другой подход поможет мне, тогда я будурадоваться. Compare.linkage () из RecordLinkage не для меня, потому что требует, чтобы в наборах данных было одинаковое количество строк (строк), иначе возникнет ошибка, что их число не равно. В моем случае это может быть очень большой разницей в количестве строк. В одном наборе данных 2 строки, а во втором 101010101 строк)) в качестве примера. я присоединяюсь к этим наборам данных, потому что мне нужно получить столбец idsnar

...