У меня есть эта ошибка: кажется, что это функция карты, вызывающая проблему, но я не знаю.
nal Accuracy: 0.382
Traceback (most recent call last):
File "VerbatimFM.py", line 120, in <module>
X_train, X_val, y_train, y_val = train_test_split(X, target, train_size = 0.75, random_state =42)
File "C:\Users\Emm\AppData\Local\Programs\Python\Python37-32\lib\site-packages\sklearn\model_selection\_split.py", line 2096, in train_test_split
arrays = indexable(*arrays)
File "C:\Users\Emm\AppData\Local\Programs\Python\Python37-32\lib\site-packages\sklearn\utils\validation.py", line 230, in indexable
check_consistent_length(*result)
File "C:\Users\Emm\AppData\Local\Programs\Python\Python37-32\lib\site-packages\sklearn\utils\validation.py", line 201, in check_consistent_length
lengths = [_num_samples(X) for X in arrays if X is not None]
File "C:\Users\Emmnuelle\AppData\Local\Programs\Python\Python37-32\lib\site-packages\sklearn\utils\validation.py", line 201, in <listcomp>
lengths = [_num_samples(X) for X in arrays if X is not None]
File "C:\Users\Emm\AppData\Local\Programs\Python\Python37-32\lib\site-packages\sklearn\utils\validation.py", line 146, in _num_samples
" a valid collection." % x)
TypeError: Singleton array array(<map object at 0x0E9735B0>, dtype=object) cannot be considered a valid collection.
Это мой код
def lem(corpus):
sent_lemme_list = []
tagger = treetaggerwrapper.TreeTagger(TAGLANG='fr')
for sent in corpus:
tags = tagger.tag_text(sent)
#tags2 = treetaggerwrapper.make_tags(tags, allow_extra = True)
sent_lem_list.append(tags)
sent_lemme_list2 = [[w.split('\t')[-1] if len(w.split('\t')) > 1 else w.split('\t')[0]
for w in subl] for subl in sent_lemme_list]
return sent_lemme_list2 # which is a list of sublists of words
lem_train= lem(sent_train_remove_stop_words)
lem_test= lem(sent_test_remove_stop_wors)
cv = CountVectorizer(analyzer='word')
X = map(cv.fit_transform, lem_train)
#X = cv.transform(lem_train)
X_test = map(cv.fit_transform, lem_test)
X_train, X_val, y_train, y_val = train_test_split(X, target, train_size = 0.75, random_state =42)
for c in [0.01, 0.05, 0.25, 0.5, 1]:
lr = LogisticRegression(C=c)
lr.fit(X_train, y_train)
print ("Accuracy for_with_lemma C=%s: %s"
% (c, accuracy_score(y_val, lr.predict(X_val))))
вот что sent_train /test_removed_stopwords выглядит как
['on a presque laissé côté moins moins besoin', 'ça changera a juste partie devant faut changer quand']
['très', 'on part haut agir quelques éléments mais sais trop passe mais ça disais ça existe déjà va agir différentes il a déjà automates peu partout via automates va va aller pêche infos agir les']
=> sent_lem_list looks like this:
[['dans\tPRP\tdans', 'a\tVER:pres\tavoir', 'certainement\tADV\tcertainement', 'process\tNOM\tprocess', 'appel\tNOM\tappel'], ['tout\tADV\ttout', 'bien\tADV\tbien', 'tout\tADV\ttout', 'a\tVER:pres\tavoir', 'comment\tADV\tcomment', 'cette\tPRO:DEM\tce', 'solution\tNOM\tsolution']]
Любая помощь, почему у меня есть ошибка, я пробовал много вещей, но она все еще не работает