У меня есть следующие столбцы в кадре данных.
Unnamed: 0, title, publication, author, year, month, title.1, content, len_article, gensim_summary, split_words, first_100_words
Я пытаюсь запустить этот небольшой фрагмент кода.
import nltk
nltk.download('punkt')
# TOKENIZE
df.first_100_words = df.first_100_words.str.lower()
df['tokenized_first_100'] = df.first_100_words.apply(lambda x: word_tokenize(x, language = 'en'))
Последняя строка кода выдает ошибку. Я получаю это сообщение об ошибке.
df.first_100_words = df.first_100_words.str.lower()
df['tokenized_first_100'] = df.first_100_words.apply(lambda x: word_tokenize(x, language = 'en'))
Traceback (most recent call last):
File "<ipython-input-129-42381e657774>", line 2, in <module>
df['tokenized_first_100'] = df.first_100_words.apply(lambda x: word_tokenize(x, language = 'en'))
File "C:\Users\ryans\Anaconda3\lib\site-packages\pandas\core\series.py", line 3848, in apply
mapped = lib.map_infer(values, f, convert=convert_dtype)
File "pandas\_libs\lib.pyx", line 2329, in pandas._libs.lib.map_infer
File "<ipython-input-129-42381e657774>", line 2, in <lambda>
df['tokenized_first_100'] = df.first_100_words.apply(lambda x: word_tokenize(x, language = 'en'))
File "C:\Users\ryans\Anaconda3\lib\site-packages\nltk\tokenize\__init__.py", line 144, in word_tokenize
sentences = [text] if preserve_line else sent_tokenize(text, language)
File "C:\Users\ryans\Anaconda3\lib\site-packages\nltk\tokenize\__init__.py", line 105, in sent_tokenize
tokenizer = load('tokenizers/punkt/{0}.pickle'.format(language))
File "C:\Users\ryans\Anaconda3\lib\site-packages\nltk\data.py", line 868, in load
opened_resource = _open(resource_url)
File "C:\Users\ryans\Anaconda3\lib\site-packages\nltk\data.py", line 993, in _open
return find(path_, path + ['']).open()
File "C:\Users\ryans\Anaconda3\lib\site-packages\nltk\data.py", line 701, in find
raise LookupError(resource_not_found)
LookupError:
**********************************************************************
Resource punkt not found.
Please use the NLTK Downloader to obtain the resource:
import nltk
nltk.download('punkt')
For more information see: https://www.nltk.org/data.html
Attempted to load tokenizers/punkt/en.pickle
Searched in:
- 'C:\\Users\\ryans/nltk_data'
- 'C:\\Users\\ryans\\Anaconda3\\nltk_data'
- 'C:\\Users\\ryans\\Anaconda3\\share\\nltk_data'
- 'C:\\Users\\ryans\\Anaconda3\\lib\\nltk_data'
- 'C:\\Users\\ryans\\AppData\\Roaming\\nltk_data'
- 'C:\\nltk_data'
- 'D:\\nltk_data'
- 'E:\\nltk_data'
- ''
**********************************************************************
Я довольно новичок во всех вещах токенизации.
Пример кода с этого сайта.
https://github.com/AustinKrause/Mod_5_Text_Summarizer/blob/master/Notebooks/Text_Cleaning_and_KMeans.ipynb