Итак, я получаю данные из индекса ES в DataFrame. Который имеет следующие столбцы tags
, text
и title
.
И я пытаюсь разделить данные из этого DataFrame, используя следующий код:
# Get the labels
tags = df.tags
# Get the text
texts = df.text
# Split the dataset
x_train,x_test,y_train,y_test = train_test_split(texts, tags, test_size = 0.2, random_state = 7)
, но это не работает, я получаю следующую ошибку
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-180-b8381ee0d3c2> in <module>
4
5 # Split the dataset
----> 6 x_train,x_test,y_train,y_test = train_test_split(df['text'], tags, test_size = 0.2, random_state = 7)
7
8 # Initialize a TfidfVectorizer
~/opt/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/_split.py in train_test_split(*arrays, **options)
2116 raise TypeError("Invalid parameters passed: %s" % str(options))
2117
-> 2118 arrays = indexable(*arrays)
2119
2120 n_samples = _num_samples(arrays[0])
~/opt/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py in indexable(*iterables)
246 """
247 result = [_make_indexable(X) for X in iterables]
--> 248 check_consistent_length(*result)
249 return result
250
~/opt/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py in check_consistent_length(*arrays)
206 """
207
--> 208 lengths = [_num_samples(X) for X in arrays if X is not None]
209 uniques = np.unique(lengths)
210 if len(uniques) > 1:
~/opt/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py in <listcomp>(.0)
206 """
207
--> 208 lengths = [_num_samples(X) for X in arrays if X is not None]
209 uniques = np.unique(lengths)
210 if len(uniques) > 1:
~/opt/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py in _num_samples(x)
150 if len(x.shape) == 0:
151 raise TypeError("Singleton array %r cannot be considered"
--> 152 " a valid collection." % x)
153 # Check that shape is returning an integer or default to len
154 # Dask dataframes may not return numeric shape[0] value
TypeError: Singleton array array(kt-rOnMBAC-oqacdW1Q- On Monday night, Donald Trump traveled to West...
k9-rOnMBAC-oqacdW1Q- Donald Trump is very busy right now trying to ...
lN-rOnMBAC-oqacdW1Q- By now, we all know that upon having emergency...
ld-rOnMBAC-oqacdW1Q- Donald Trump s horrible decisions and disgusti...
lt-rOnMBAC-oqacdW1Q- It s tough sometimes to imagine that Donald Tr...
...
Y-CvOnMBAC-oqacdBwEJ BRUSSELS (Reuters) - NATO allies on Tuesday we...
Z-CvOnMBAC-oqacdBwEJ JAKARTA (Reuters) - Indonesia will buy 11 Sukh...
ZOCvOnMBAC-oqacdBwEJ LONDON (Reuters) - LexisNexis, a provider of l...
ZeCvOnMBAC-oqacdBwEJ MINSK (Reuters) - In the shadow of disused Sov...
ZuCvOnMBAC-oqacdBwEJ MOSCOW (Reuters) - Vatican Secretary of State ...
Name: text, Length: 44908, dtype: object, dtype=object) cannot be considered a valid collection.
Но при проверке .shape
из texts and tags
они оба одинаковые (44908, 1)