R str_subset проблема со строкой R - PullRequest
0 голосов
/ 05 мая 2018

В пакете stringr содержится 720 предложений. Команда has_colour <- str_subset(sentences, colour_match), представленная ниже, является попыткой извлечь только предложений с цветом в них. Но это не делает этого. Это просто вытаскивает первые 57 из 720 предложений. Что я делаю не так?

library(tidyverse)
library(stringr)
colours <- c("red", "orange", "yellow", "green", "blue", "purple")
colour_match <- str_c(colours, collapse = "|")
has_colour <- str_subset(sentences, colour_match)

Результаты, примечание 3, 4 и т. Д. Не содержат упоминания цвета:

 [1] "Glue the sheet to the dark blue background."       
 [2] "Two blue fish swam in the tank."                   
 [3] "The colt reared and threw the tall rider."         
 [4] "The wide road shimmered in the hot sun."           
 [5] "See the cat glaring at the scared mouse."          
 [6] "A wisp of cloud hung in the blue air."             
 [7] "Leaves turn brown and yellow in the fall."         
 [8] "He ordered peach pie with ice cream."              
 [9] "Pure bred poodles have curls."                     
[10] "The spot on the blotter was made by green ink."    
[11] "Mud was spattered on the front of his white shirt."
[12] "The sofa cushion is red and of light weight."      
[13] "The sky that morning was clear and bright blue."   
[14] "Torn scraps littered the stone floor."             
[15] "The doctor cured him with these pills."            
[16] "The new girl was fired today at noon."             
[17] "The third act was dull and tired the players."     
[18] "A blue crane is a tall wading bird."               
[19] "Lire wires should be kept covered."                
[20] "It is hard to erase blue or red ink."              
[21] "The wreck occurred by the bank on Main Street."    
[22] "The lamp shone with a steady green flame."         
[23] "The box is held by a bright red snapper."          
[24] "The prince ordered his head chopped off."          
[25] "The houses are built of red clay bricks."          
[26] "The red tape bound the smuggled food."             
[27] "Nine men were hired to dig the ruins."             
[28] "The flint sputtered and lit a pine torch."         
[29] "Hedge apples may stain your hands green."          
[30] "The old pan was covered with hard fudge."          
[31] "The plant grew large and green in the window."     
[32] "The store walls were lined with colored frocks."   
[33] "The purple tie was ten years old."                 
[34] "Bathe and relax in the cool green grass."          
[35] "The clan gathered on each dull night."             
[36] "The lake sparkled in the red hot sun."             
[37] "Mark the spot with a sign painted red."            
[38] "Smoke poured out of every crack."                  
[39] "Serve the hot rum to the tired heroes."            
[40] "The couch cover and hall drapes were blue."        
[41] "He offered proof in the form of a lsrge chart."    
[42] "A man in a blue sweater sat at the desk."          
[43] "The sip of tea revives his tired friend."          
[44] "The door was barred, locked, and bolted as well."  
[45] "A thick coat of black paint covered all."          
[46] "The small red neon lamp went out."                 
[47] "Paint the sockets in the wall dull green."         
[48] "Wake and rise, and step into the green outdoors."  
[49] "The green light in the brown box flickered."       
[50] "He put his last cartridge into the gun and fired." 
[51] "The ram scared the school children off."           
[52] "Tear a thin sheet from the yellow pad."            
[53] "Dimes showered down from all sides."               
[54] "The sky in the west is tinged with orange red."    
[55] "The red paper brightened the dim stage."           
[56] "The hail pattered on the burnt brown grass."       
[57] "The big red apple fell to the ground."

1 Ответ

0 голосов
/ 05 мая 2018

Необходимо убедиться, что цвета вставлены вместе с границей для каждого:

str_subset(sentences,paste0("\\b",colours,"\\b",collapse="|"))

Хотя это не будет учитывать капитализацию цветов .. Таким образом, вам нужно:

str_subset(sentences,regex(paste0("\\b",colours,"\\b",collapse="|"),TRUE))
...