Question

Я импортировал текстовые данные в pandas dataframe.Я хотел бы реализовать Vectorizer.Поэтому я использую sklearn для выполнения tfidf и так далее

Итак, первый шаг, который я сделал.очистить текст.

#Creating Function
from nltk.corpus import stopwords
def text_process(sms):  
nonpunc = [char for char in sms if char not in string.punctuation]
nonpunc = ''.join(nonpunc)
return[word for word in nonpunc.split() if word.lower() not in stopwords.words('english')]

Далее

data['sms'].head(5).apply(text_process)

Далее

from sklearn.feature_extraction.text import  CountVectorizer
bow_transformer = CountVectorizer(analyzer = text_process).fit(data['sms'])

Я получил ошибку.

  ---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-84-f1812582c7e1> in <module>
      1 #Step 1
      2 from sklearn.feature_extraction.text import  CountVectorizer
----> 3 bow_transformer = CountVectorizer(analyzer = text_process).fit(data['sms'])

~\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py in fit(self, raw_documents, y)
    976         self
    977         """
--> 978         self.fit_transform(raw_documents)
    979         return self
    980 

~\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py in fit_transform(self, raw_documents, y)
   1010 
   1011         vocabulary, X = self._count_vocab(raw_documents,
-> 1012                                           self.fixed_vocabulary_)
   1013 
   1014         if self.binary:

~\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py in _count_vocab(self, raw_documents, fixed_vocab)
    920         for doc in raw_documents:
    921             feature_counter = {}
--> 922             for feature in analyze(doc):
    923                 try:
    924                     feature_idx = vocabulary[feature]

<ipython-input-82-4149ae75d7bf> in text_process(sms)
      3 def text_process(sms):
      4 
----> 5     nonpunc = [char for char in sms if char not in string.punctuation]
      6     nonpunc = ''.join(nonpunc)
      7     return[word for word in nonpunc.split() if word.lower() not in stopwords.words('english')]

TypeError: 'NoneType' object is not iterable

Hani Ihlayyle · Answer 1 · 24 октября 2018

у меня есть значения NAN в данных.Я использовал регулярные выражения, которые вызывают удаление всех данных.

NoneType 'объект не повторяется для Vectorizer sklearn

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

1 Ответ

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

NoneType 'объект не повторяется для Vectorizer sklearn

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

1 Ответ

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Нет похожих вопросов