Удаление всех существительных фраз из текста с помощью Textblob - PullRequest
0 голосов
/ 06 апреля 2020

Мне нужно удалить все существительное из текста. Результатом является датафрейм. Я использую текстовый блоб. Ниже приведен код.

from textblob import TextBlob

          strings = []
            for col in result:
                for i in range(result.shape[0]):
                    text = result[col][i]
                    Txtblob = TextBlob(text)

                    for word, pos in Txtblob.noun_phrases:
                        print (word, pos)
                        if tag != 'NNP'
                           print(' '.join(edited_sentence))

Он просто распознает один NNP

1 Ответ

1 голос
/ 06 апреля 2020

Чтобы удалить все слова, помеченные 'NNP', из следующего текста (из документации ), вы можете сделать следующее:

from textblob import TextBlob

# Sample text
text = '''
The titular threat of The Blob has always struck me as the ultimate movie
monster: an insatiably hungry, amoeba-like mass able to penetrate
virtually any safeguard, capable of--as a doomed doctor chillingly
describes it--"assimilating flesh on contact.'''

text = TextBlob(text)

# Create a list of words that are tagged with 'NNP'
# In this case it will only be 'Blob'
words_to_remove = [word[0] for word in [tag for tag in text.tags if tag[1] == 'NNP']]

# Remove the Words from the sentence, using words_to_remove
edited_sentence = ' '.join([word for word in text.split(' ') if word not in words_to_remove])

# Show the result
print(edited_sentence)

out

# Notice the lack of the word 'Blob'
'\nThe titular threat of The has always struck me as the ultimate
 movie\nmonster: an insatiably hungry, amoeba-like mass able to 
 penetrate\nvirtually any safeguard, capable of--as a doomed doctor 
 chillingly\ndescribes it--"assimilating flesh on contact.\nSnide 
 comparisons to gelatin be damned, it\'s a concept with the 
 most\ndevastating of potential consequences, not unlike the grey goo 
 scenario\nproposed by technological theorists fearful of\nartificial 
 intelligence run rampant.\n'

Комментарии к вашему образцу

from textblob import TextBlob

strings = [] # This variable is not used anywhere
for col in result:
    for i in range(result.shape[0]):
        text = result[col][i]
        txt_blob = TextBlob(text)

        # txt_blob.noun_phrases will return a list of noun_phrases,
        # To get the position of each list you need use the function 'enuermate', like this
        for word, pos in enumerate(txt_blob.noun_phrases):

            # Now you can print the word and position
            print (word, pos)
            # This will give you something like the following:
            # 0 titular threat
            # 1 blob
            # 2 ultimate movie monster

            # This following line does not make any sense, because tag has not yet been assigned
            # and you are not iterating over the words from the previous step
            if tag != 'NNP'
                # You are not assigning anything to edited_sentence, so this would not work either.
                print(' '.join(edited_sentence))

Ваш образец с новым кодом

from textblob import TextBlob

for col in result:
    for i in range(result.shape[0]):
        text = result[col][i]
        txt_blob = TextBlob(text)

        # Create a list of words that are tagged with 'NNP'
        # In this case it will only be 'Blob'
        words_to_remove = [word[0] for word in [tag for tag in txt_blob.tags if tag[1] == 'NNP']]

        # Remove the Words from the sentence, using words_to_remove
        edited_sentence = ' '.join([word for word in text.split(' ') if word not in words_to_remove])

        # Show the result
        print(edited_sentence)
...