Как использовать трубопроводы трансформаторов HuggingFace? - PullRequest
3 голосов
/ 13 февраля 2020

Я пытаюсь сделать простой проект классификации текста с помощью Transformers, я хочу использовать функцию конвейера, добавленную в V2.3, но документации практически нет.

data = pd.read_csv("data.csv")
FLAUBERT_NAME = "flaubert-base-cased"

encoder = LabelEncoder()
target = encoder.fit_transform(data["category"])
y = target
X = data["text"]

model = FlaubertForSequenceClassification.from_pretrained(FLAUBERT_NAME)
tokenizer = FlaubertTokenizer.from_pretrained(FLAUBERT_NAME)
pipe = TextClassificationPipeline(model, tokenizer, device=-1)  # device=-1 -> Use only CPU

print("Test #1: pipe('Bonjour le monde')=", pipe(['Bonjour le monde']))

Traceback (most recent call last):
  File "C:/Users/PLHT09191/Documents/work/dev/Classif_Annonces/src/classif_annonce.py", line 33, in <module>
    model = FlaubertForSequenceClassification.from_pretrained(FLAUBERT_NAME)
  File "C:\Users\Myself\Documents\work\dev\Classif_Annonces\venv\lib\site-packages\transformers-2.4.1-py3.5.egg\transformers\modeling_utils.py", line 463, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
  File "C:\Users\Myself\Documents\work\dev\Classif_Annonces\venv\lib\site-packages\transformers-2.4.1-py3.5.egg\transformers\modeling_flaubert.py", line 343, in __init__
    super(FlaubertForSequenceClassification, self).__init__(config)
  File "C:\Users\Myself\Documents\work\dev\Classif_Annonces\venv\lib\site-packages\transformers-2.4.1-py3.5.egg\transformers\modeling_xlm.py", line 733, in __init__
    self.transformer = XLMModel(config)
  File "C:\Users\Myself\Documents\work\dev\Classif_Annonces\venv\lib\site-packages\transformers-2.4.1-py3.5.egg\transformers\modeling_xlm.py", line 382, in __init__
    self.ffns.append(TransformerFFN(self.dim, self.hidden_dim, self.dim, config=config))
  File "C:\Users\Myself\Documents\work\dev\Classif_Annonces\venv\lib\site-packages\transformers-2.4.1-py3.5.egg\transformers\modeling_xlm.py", line 203, in __init__
    self.lin2 = nn.Linear(dim_hidden, out_dim)
  File "C:\Users\Myself\Documents\work\dev\Classif_Annonces\venv\lib\site-packages\torch\nn\modules\linear.py", line 72, in __init__
    self.weight = Parameter(torch.Tensor(out_features, in_features))
RuntimeError: [enforce fail at ..\c10\core\CPUAllocator.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 9437184 bytes. Buy new RAM!


Process finished with exit code 1

Как я могу использовать свой конвейер с моими X и y данными?

...