Question

Я пытаюсь ускорить код Python, на котором я сейчас нахожусь.Код направлен на сжатие социального графа.Пример кода работает так, как задумано для моих данных, основной проблемой является время, которое требуется для этого.Без какого-либо распараллеливания код занимает около 4000 секунд в Slashdot0902 (https://snap.stanford.edu/data/soc-Slashdot0902.html) и 3800 в soc-epinions1 (https://snap.stanford.edu/data/soc-Epinions1.html)) для получения сжатого представления, которое намного выше, чем предполагалось.

Я профилировалкод и большая часть времени, как и ожидалось, тратятся в основном теле сжатия. Я ищу способы распараллелить этот код, то есть, по сути, запустить алгоритм сжатия для каждого элемента одновременно (сжатие между различными элементами совершенно не связано).

Попытка добиться этого ускорения с помощью многопроцессорной установки не приведет к ожидаемому увеличению скорости. Я пытался добиться распараллеливания, используя numba безуспешно. Я понимаю, что я пытаюсь передать матрицу, двумерный массив в качестве ввода некак это должно быть сделано, но я застрял на этом этапе.

Вот пример кода. (Полный код здесь https://pastebin.com/8yihNs2Y)

# Import everything

#Load adjacency matrix into into a variable called matrix
# Creates a list to store the final output values before writing into a file
list1 = []

# Finds the parent of current node
@cuda.jit(device=True)
def parent(index):
    return int((index-1) / 2)

# Finds the sibling of current node left or right sibling based on index
@cuda.jit(device=True)
def sibling(index):
    if(index % 2 == 1):
        return index+1
    else:
        return index-1

len_row = matrix.shape[0]
# Find the height of the binary tree and use it to find n i.e the no.of elements in the array
height = int(math.log2(len_row)) + 1
n = (2 ** height) - 1
start_index = n-len_row
temp_array = np.full(n, -1, dtype=np.int8)


@vectorize(['int32(int32,int32,int32,int32,int32)'], target='cuda')
def compress(input_array, temp_array, n, start_index, len_row):
    for i in range(len_row):
        if(input_array[i] == 1):
            current_index = start_index+i
            dcn_reached = False
            while dcn_reached == False:
                temp_array[current_index] = 1
                if(temp_array[parent(current_index)] != 1):
                    temp_array[parent(current_index)] = 1
                    temp_array[sibling(current_index)] = 0
                    current_index = parent(current_index)
                else:
                    dcn_reached = True
    return temp_array


list1.append(compress(matrix, temp_array, n, start_index, len_row))

Я ожидал бы список сзначения temp_array, если он работает корректно. Тогда мне придется манипулироватьp_array для получения окончательного массива, так как np.where(temp_array != -1) не будет работать в GPU.

Это основная часть непараллельного кода (полный код здесь https://pastebin.com/BNvVUzV3) (Обратите внимание, чтобы код работал, вам может потребоваться изменить sep=' ' в pd.read_csv на основефайл)

# Load edgelist into a 2d list called matrix
list1 = []

# Two functions parent and sibling to return the index of parent and sibling node in a binary tree

start = time.time()
len_row = matrix.shape[0]
# Find the height of the binary tree and use it to find n i.e the no.of elements in the array
height = int(math.log2(len_row)) + 1
n = (2 ** height) - 1

# Loop for lal the elements in row i of the matrix to generate compressed format of that row
for i in range(len_row):
    print("Element "+str(i))
    # Input_array = i'th row of matrix array
    input_array = matrix[i]
    # Initilaize temp array with -1 values
    temp_array = np.full(n, -1, dtype=np.int8)
    start_index = n-len(input_array)
    for i in range(len_row):
        if(input_array[i] == 1):
            current_index = start_index+i
            dcn_reached = False
            while dcn_reached == False:
                temp_array[current_index] = 1
                if(temp_array[parent(current_index)] != 1):
                    temp_array[parent(current_index)] = 1
                    temp_array[sibling(current_index)] = 0
                    current_index = parent(current_index)
                else:
                    dcn_reached = True
    i = np.where(temp_array != -1)
    output = temp_array[i]
    list1.append(output)

# Pass this list through bz2 compression and pickle dump it to a bin file

Я хотел бы достичь хотя бы одного из следующих

Получить работающий код numba

Предложения для любых других альтернативных подходовраспараллелить приведенный выше код

Предложения по дальнейшей оптимизации

Ускоренный код Python для сжатия графов с помощью Numba

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

Ответы [ 0 ]

Ускоренный код Python для сжатия графов с помощью Numba

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

Ответы [ 0 ]

Нет похожих вопросов