Я хочу прочитать текст и разбить его на предложения, используя NLTK sent_tokenize.Как это сделать?Я использовал wordpunc_tokenize и он работает хорошо, но не sent_tokenize.
from collections import defaultdict
import nltk
from nltk.tokenize import
word_tokenize,sent_tokenize,wordpunct_tokenize
import re
import os
import sys
from pathlib import Path
def stats():
while True:
try:
file_to_open =Path(input("\nPlease, insert your file path: "))
with open(file_to_open) as f:
words = word_tokenize(f.read())
break
except FileNotFoundError:
print("\nFile not found. Better try again")
except IsADirectoryError:
print("\nIncorrect Directory path.Try again")
print(words)
print('\n\nThis text contains',len(words), 'tokens')
def sent_tokenize():
while True:
try:
file_to_open =Path(input("\nPlease, insert your file path: "))
with open(file_to_open) as f:
words = sent_tokenize(f.read())
break
except FileNotFoundError:
print("\nFile not found. Better try again")
except IsADirectoryError:
print("\nIncorrect Directory path.Try again")
print(words)
print('\n\nThis text contains',len(words), 'sentences')
stats()
sent_tokenize()
Я хочу напечатать текст, разделенный на предложения, но получаю сообщение об ошибке:
Traceback (most recent call last):
File "/Users/nataliaresende/Dropbox/PYTHON/stats.py", line 46, in
<module>
sent_tokenize()
File "/Users/nataliaresende/Dropbox/PYTHON/stats.py", line 40, in
sent_tokenize
sent=sent_tokenize(words)
TypeError: sent_tokenize() takes 0 positional arguments but 1 was
given
Может кто-нибудь помочь?