MALLET Невозможно восстановить список экземпляров - PullRequest
0 голосов
/ 26 января 2019

Я пытаюсь обучить модель темы MALLET, которая была создана с помощью файла импорта, но мне сообщают об ошибке, в которой говорится, что MALLET не удалось восстановить список экземпляров. Кроме того, я испытываю ту же ошибку при загрузке совершенно другой модели из другого относительно большого набора данных. Тем не менее, я могу использовать обучающие темы на модели из меньшего набора данных. В данном случае текстовые данные составляют ~ 20 ГБ, а выходная модель - 14 МБ. Модель создана с использованием:

mallet import-file --input corpus.dat  --output topics.mallet

Вот ошибка, которую я получаю при использовании обучающих тем на модели:

Mallet LDA: 10 topics, 4 topic bits, 1111 topic mask
java.io.EOFException
        at java.io.ObjectInputStream$PeekInputStream.readFully(Unknown Source)
        at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(Unknown Source)
        at java.io.ObjectInputStream$BlockDataInputStream.readUTF(Unknown Source)
        at java.io.ObjectInputStream.readString(Unknown Source)
        at java.io.ObjectInputStream.readObject0(Unknown Source)
        at java.io.ObjectInputStream.readObject(Unknown Source)
        at cc.mallet.types.Alphabet.readObject(Alphabet.java:345)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at java.io.ObjectStreamClass.invokeReadObject(Unknown Source)
        at java.io.ObjectInputStream.readSerialData(Unknown Source)
        at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
        at java.io.ObjectInputStream.readObject0(Unknown Source)
        at java.io.ObjectInputStream.readObject(Unknown Source)
        at cc.mallet.types.FeatureVector.readObject(FeatureVector.java:445)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at java.io.ObjectStreamClass.invokeReadObject(Unknown Source)
        at java.io.ObjectInputStream.readSerialData(Unknown Source)
        at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
        at java.io.ObjectInputStream.readObject0(Unknown Source)
        at java.io.ObjectInputStream.readObject(Unknown Source)
        at cc.mallet.types.Instance.readObject(Instance.java:228)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at java.io.ObjectStreamClass.invokeReadObject(Unknown Source)
        at java.io.ObjectInputStream.readSerialData(Unknown Source)
        at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
        at java.io.ObjectInputStream.readObject0(Unknown Source)
        at java.io.ObjectInputStream.readObject(Unknown Source)
        at java.util.ArrayList.readObject(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at java.io.ObjectStreamClass.invokeReadObject(Unknown Source)
        at java.io.ObjectInputStream.readSerialData(Unknown Source)
        at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
        at java.io.ObjectInputStream.readObject0(Unknown Source)
        at java.io.ObjectInputStream.readObject(Unknown Source)
        at cc.mallet.types.InstanceList.load(InstanceList.java:841)
        at cc.mallet.topics.tui.TopicTrainer.main(TopicTrainer.java:199)
Unable to restore instance list topics.mallet: java.lang.IllegalArgumentException: Couldn't read InstanceList from file topics.mallet
...