«NoneType» не имеет атрибута «lower» - ошибка при очистке текста - PullRequest
0 голосов
/ 24 февраля 2020

Ниже мой код, который я запускаю в блоках данных, а ниже - ошибка.

data = d.select("*").toPandas()
train, test = train_test_split(data, test_size = .20, random_state = True)
train['set'] = 'train'
test['set'] = 'test'
data = pd.concat([train,test], ignore_index=True)

def clean_text(text):
  return "".join([c for c in text.lower() if c not in punctuation])

data['text_cleaned'] = data['text'].map(clean_text)

tfidf = TfidfVectorizer()
tfidf.fit(data['text_cleaned'])

Ошибка:

AttributeError: 'NoneType' object has no attribute 'lower'
/local_disk0/tmp/1582551158268-0/PythonShell.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

/local_disk0/tmp/1582551158268-0/PythonShell.py:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  import 

AttributeError: 'NoneType' object has no attribute 'lower'         

1 Ответ

0 голосов
/ 24 февраля 2020

Вы можете отфильтровать Nones:

data = d.select("*").toPandas()
train, test = train_test_split(data, test_size = .20, random_state = True)
train['set'] = 'train'
test['set'] = 'test'
data = pd.concat([train,test], ignore_index=True)

def clean_text(text):
    return "".join([c for c in text.lower() if (text is not None) and (c not in punctuation)])

data['text_cleaned'] = data['text'].map(clean_text)

tfidf = TfidfVectorizer()
tfidf.fit(data['text_cleaned'])
...