Я думаю, что вы должны отправить файл формата word2vec в качестве входных данных для этой функции, а также вы можете посмотреть на изменение кодировки в соответствии с подходящим вам методом.
def load_word2vec_format(cls, fname, fvocab=None, binary=False, encoding='utf8', unicode_errors='strict',
limit=None, datatype=REAL):
"""Load the input-hidden weight matrix from the original C word2vec-tool format.
Note that the information stored in the file is incomplete (the binary tree is missing),
so while you can query for word similarity etc., you cannot continue training
with a model loaded this way.
Parameters
----------
fname : str
The file path to the saved word2vec-format file.
fvocab : str
Optional file path to the vocabulary.Word counts are read from `fvocab` filename,
if set (this is the file generated by `-save-vocab` flag of the original C tool).
binary : bool
If True, indicates whether the data is in binary word2vec format.
encoding : str
If you trained the C model using non-utf8 encoding for words, specify that
encoding in `encoding`.
unicode_errors : str
default 'strict', is a string suitable to be passed as the `errors`
argument to the unicode() (Python 2.x) or str() (Python 3.x) function. If your source
file may include word tokens truncated in the middle of a multibyte unicode character
(as is common from the original word2vec.c tool), 'ignore' or 'replace' may help.
limit : int
Sets a maximum number of word-vectors to read from the file. The default,
None, means read all.
datatype : :class: `numpy.float*`
(Experimental) Can coerce dimensions to a non-default float type (such
as np.float16) to save memory. (Such types may result in much slower bulk operations
or incompatibility with optimized routines.)```