Используя NLP или Spacy, Как мы можем извлечь контекстные данные из текстового объекта в качестве входных данных? - PullRequest
2 голосов
/ 22 марта 2019

Например, есть текст (в виде документа) вместе с именем человека "Джон".Нам нужно извлечь все предложения из текста, где есть упоминание о Джоне по имени или иным образом.

1 Ответ

0 голосов
/ 22 марта 2019

вы использовали NLTK для извлечения сущностей?Я сделал аналогичный ниже,

import nltk
import re
from nltk.sem import extract_rels,rtuple
from nltk.chunk import tree2conlltags

sample = """"Michael Joseph Jackson was born in Gary, Indiana, near Chicago, on August 29, 1958.
He was the eighth of ten children in the Jackson family, a working-class African-American family living in a two-bedroom house on Jackson Street.
His mother, Katherine Esther Jackson (née Scruse), left the Baptist tradition in 1963 to become a devout Jehovah's Witness.She played clarinet and piano and had aspired to be a country-and-western performer; she worked part-time at Sears to support the family.
His father, Joseph Walter 'Joe' Jackson, a former boxer, was a steelworker at U.S. Steel.
Joe played guitar with a local rhythm and blues band, the Falcons, to supplement the family's income.
Despite being a convinced Lutheran, Joe followed his wife's faith, as did all their children.
His father's great-grandfather, July 'Jack' Gale, was a Native American medicine man and US Army scout.
Michael grew up with three sisters (Rebbie, La Toya, and Janet) and five brothers (Jackie, Tito, Jermaine, Marlon, and Randy).
A sixth brother, Marlon's twin Brandon, died shortly after birth."""

sentences = nltk.sent_tokenize(sample)
tokenized_sentences = [nltk.word_tokenize(sentence) for sentence in sentences]
tagged_sentences = [nltk.pos_tag(sentence) for sentence in tokenized_sentences]


for i, sent in enumerate(tagged_sentences):
    sent = nltk.ne_chunk(sent) 
    print(sent)

Это печатает ниже, (S / (ЧЕЛОВЕК Майкл / NNP Джозеф / NNP Джексон / NNP) был / VBD родился / VBN в / IN (GPEGary / NNP), /, (GPE Indiana / NNP), /, вблизи / IN (GPE Chicago / NNP), /, on / IN August / NNP 29 / CD, /, 1958 / CD

Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...