SAS SQL - неправильный инструмент для объединения строк в результат конкатенации (строка csv).
SQL можно использовать для получения найденных элементов, которые нужно объединить, и цикла данных DOW для конкатенации:
proc sql;
create view matched_animals as
select narrative, animal from
narratives left join animals on narrative contains trim(animal)
order by narratives, animal;
data want;
length animal_found $2000;
do until (last.narrative);
set matched_animals;
by narrative;
animal_found = catx(',',animal_found,animal);
end;
run;
Это будет работать, но может не хватать ресурсов в зависимости от количества элементов таблиц описаний и животных и скорости сопоставления.
В подходе с шагом данных можно использовать хеш-объект, countw
и scan
или findw
.Есть два подхода, с way2 вероятным лучшим / наиболее типичным вариантом использования.
* Thanks Reeza for sample data;
data narratives;
infile cards;
input narrative $100.;
cards;
This is some random text with words that are weirhd such as cat, dog frogs, and any other weird names
This is a notehr rnaodm text with word ssuch as bird and cat
This has nothing in it
This is another phrages with elephants
;
run;
data animals;
input animal $20.;
cards;
cat
dog
frog
bird
elephant
;;;;
run;
data want;
set narratives;
length animals_found_way1 animals_found_way2 $2000;
if _n_ = 1 then do;
if 0 then set animals(keep=animal); * prep pdv;
declare hash animals(dataset:'animals');
animals.defineKey('animal');
animals.defineDone();
declare hiter animals_iter('animals');
end;
* check each word of narrative for animal match;
* way 1 use case: narratives shorter than animals list;
do _n_ = 1 to countw(narrative);
token = scan(narrative, _n_);
if animals.find(key:token) = 0 then
animals_found_way1 = catx(',', animals_found_way1, token);
loopcount_way1 = sum (loopcount_way1, 1);
end;
* check each animal for match;
* way 2 use case: animal list shorter than narratives;
do while (animals_iter.next() = 0);
if findw(narrative, trim(animal)) then
animals_found_way2 = catx(',', animals_found_way2, animal);
loopcount_way2 = sum(loopcount_way2, 1);
end;
put;
drop token animal;
run;