Question

Я пытался создать многопроцессорное встраивание BERT.

Попытался кодировать, как показано ниже, но не смог взлететь.

bert_command = 'bert-serving-start -model_dir uncased_L-12_H-768_A-12 -num_worker 40'
process = subprocess.Popen(bert_command.split(), stdout=subprocess.PIPE)
from bert_serving.client import BertClient

, затем используя

import concurrent.futures

для многопроцессорной обработки встраивания

def embedding_dic(file_list):
    dic={}
    with concurrent.futures.ProcessPoolExecutor(max_workers=20) as executor:
        for file, e in zip(file_list, executor.map(embedding_file, file_list)):
            dic[file]=e
    return dic

def embedding_file(file):
    file_obj = open(form_path+file, 'r')
    file_read = file_obj.readlines()
    file_obj.close()
    file_read = [i.rstrip() for i in file_read if not(bool(not i or i.isspace()))] 
    file_read =[' |||'.join(file_read )]
    bc = BertClient(check_length=False)
    try:
        embedding = bc.encode(file_read)
    except ValueError: 
        embedding=None
    return embedding

Но поток застревает на

embedding = bc.encode(file_read)

Любая помощь высоко ценится.

Относительно конфигурации машины:

Системная информация

Debian
Версия TensorFlow: 1.13
Python версия: 3.6
bert-as-service версия: 1.8.9
Модель процессора и памяти: 48-ядерный Docker Machine и 60 ГБ памяти

BERT & Multiprocessing

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

Ответы [ 0 ]

BERT & Multiprocessing

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

Ответы [ 0 ]

Похожие темы