Как удалить части цитирования, которые не содержат скобок - PullRequest
0 голосов
/ 19 февраля 2019

ДАННЫЕ

mystring1 <- "Other work has shown that, in addition to language-general features such as a decreased speaking rate and an expanded pitch range, clear speech production involves the enhancement of the acoustic-phonetic distance between phonologically contrastive categories ?e.g., Ferguson and Kewley-Port, 2002; Krause and Braida, 2004, Picheny et al, 1986; Smiljanic and Bradlow, 2005, 2007?."

mystring2 <- "Other work has shown that, in addition to language-general features such as a decreased speaking rate and an expanded pitch range, clear speech production involves the enhancement of the acoustic-phonetic distance between phonologically contrastive categories ?e.g., Ferguson and Kewley-Port, 2002; Krause and Braida, 2004, Picheny et al, 1986; Smiljanic and Bradlow, 2005, 2007?. Therefore, reduced sensitivity to any or all of the language-specific acoustic-phonetic dimensions of contrast and clear speech enhancement would yield a diminished clear speech benefit for non-native listeners. This may appear somewhat surprising given that clear speech production was elicited in our studies by instructing the talkers to speak clearly for the sake of listeners with either a hearing impairment or from a different native language background. However, as discussed further in Bradlow and Bent ?2002?, the limits of clear speech as a means of enhancing non-native speech perception likely reflect the “mistuning” that characterizes spoken language communication between native and non-native speakers."

Я хотел бы получить помощь по регулярному выражению.Я получил некоторые текстовые данные.В основном я хочу удалить части цитирования, которые появляются между последним словом в предложении и точкой.Однако скобки как-то отсутствуют.mystring1 является примером для этого.В этом примере я хочу удалить e.g., Ferguson and Kewley-Port, 2002; Krause and Braida, 2004, Picheny et al, 1986; Smiljanic and Bradlow, 2005, 2007?.Но это предложение является лишь одним из предложений в параграфе.mystring2 содержит еще три предложения после mystring1.Моя цель - убрать часть цитаты из mystring2.Но я не был успешным;шаблон удаляет больше текстов, чем я хочу.Как я могу пересмотреть шаблон регулярных выражений?Заранее благодарю за помощь.

# This works for mystring1.
gsub(x = mystring1, pattern = "e\\.g\\.,.*[0-9]{4}(?=.)", replacement = "", perl = T)

[1] "Other work has shown that, in addition to language-general features such as a 
     decreased speaking rate and an expanded pitch range, clear speech production involves
     the enhancement of the acoustic-phonetic distance between phonologically contrastive
     categories ??."

# But this pattern does not work for mystring2; gsub() removes texts more than I want.
gsub(x = mystring2, pattern = "e\\.g\\.,.*[0-9]{4}(?=.)", replacement = "", perl = T)

[1] "Other work has shown that, in addition to language-general features such as a decreased
     speaking rate and an expanded pitch range, clear speech production involves the
     enhancement of the acoustic-phonetic distance between phonologically contrastive
     categories ??, the limits of clear speech ... (I trimmed texts here) speakers."

1 Ответ

0 голосов
/ 19 февраля 2019

Я предлагаю использовать

\be\.g\.,.*?[0-9]{4}[^\w.]*(?=\.)

См. Демоверсию regex .

Подробности

  • \be\.g\. - целое слово e.g. (\b - это граница слова)
  • , - запятая
  • .*? - любые 0+ символов, кроме символов перевода строки (добавить(?s) в начале шаблона, чтобы он соответствовал также разрывам строк)
  • [0-9]{4} - четыре цифры
  • [^\w.]* - 0+ символов, кроме символов слова и точки
  • (?=\.) - (положительный взгляд, соответствующий местоположению, где) . должен быть непосредственно справа от текущего местоположения.

R demo :

rx <- "\\be\\.g\\.,.*?[0-9]{4}[^\\w.]*(?=\\.)"
gsub(x = mystring1, pattern = rx, replacement = "", perl = TRUE)
## => [1] "Other work has shown that, in addition to language-general features such as a decreased speaking rate and an expanded pitch range, clear speech production involves the enhancement of the acoustic-phonetic distance between phonologically contrastive categories ?."
gsub(x = mystring2, pattern = rx, replacement = "", perl = TRUE)
## => [1] "Other work has shown that, in addition to language-general features such as a decreased speaking rate and an expanded pitch range, clear speech production involves the enhancement of the acoustic-phonetic distance between phonologically contrastive categories ?. Therefore, reduced sensitivity to any or all of the language-specific acoustic-phonetic dimensions of contrast and clear speech enhancement would yield a diminished clear speech benefit for non-native listeners. This may appear somewhat surprising given that clear speech production was elicited in our studies by instructing the talkers to speak clearly for the sake of listeners with either a hearing impairment or from a different native language background. However, as discussed further in Bradlow and Bent ?2002?, the limits of clear speech as a means of enhancing non-native speech perception likely reflect the “mistuning” that characterizes spoken language communication between native and non-native speakers."
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...