Я сталкиваюсь с этим при выполнении кода в блокноте Kaggle
TypeError: невозможно использовать строковый шаблон для объекта, подобного байту.
Тот же код выполняется правильно в блокноте Spyder.
import nltk
import pandas as pd
import re
messages = pd.read_csv('../input/spam.csv', sep='\t',
names=["label", "message"],encoding='latin-1')
print(messages)
Вывод сообщения
![](https://i.stack.imgur.com/Yxhtt.png)
#text preprocessing
from nltk.stem.wordnet import WordNetLemmatizer
from nltk.corpus import stopwords
lemmatizer = WordNetLemmatizer()
corpus =[]
for i in range(0,len(messages)):
words = re.sub('[^a-zA-Z]','',messages['message'][i])
words = words.lower()
words = words.split()
words = [lemmatizer.lemmatize(word) for word in words if not word in set(stopwords.words('english'))]
words = ''.join(words)
corpus.append(words)
Сведения об ошибке:
TypeError Traceback (most recent call last)
<ipython-input-8-715dc7ef0530> in <module>
27
28 for i in range(0,len(messages)):
---> 29 words = re.sub('[^a-zA-Z]','',messages['message'][i])
30 words = words.lower()
31 words = words.split()
/opt/conda/lib/python3.6/re.py in sub(pattern, repl, string, count, flags)
189 a callable, it's passed the match object and must return
190 a replacement string to be used."""
--> 191 return _compile(pattern, flags).sub(repl, string, count)
192
193 def subn(pattern, repl, string, count=0, flags=0):
TypeError: cannot use a string pattern on a bytes-like object
cannot use a string pattern on a bytes-like object
метка сообщения
0 v1,v2,,, NaN
1 ham,"Go until jurong point, crazy.. Available ... NaN
2 ham,Ok lar... Joking wif u oni...,,, NaN
3 spam,Free entry in 2 a wkly comp to win FA Cup... NaN
4 ham,U dun say so early hor... U c already then... NaN
... ... ...
5570 spam,"This is the 2nd time we have tried 2 con... NaN
5571 ham,Will Ì_ b going to esplanade fr home?,,, NaN
5572 ham,"Pity, * was in mood for that. So...any ot... NaN
5573 ham,The guy did some bitching but I acted like... NaN
5574 ham,Rofl. Its true to its name,,, NaN
[5575 rows x 2 columns]