Очень большие извинения за мой довольно базовый c вопрос, заранее!
Учитывая:
import cupy as cp
def dist_gpu(a, b):
d = cp.linalg.norm(a[:, None, :] - b[None, :, :], axis=2)
d = cp.transpose(d)
sorted_d = cp.sort(d)
sorted_ind = cp.argsort(d)
return sorted_d, sorted_ind
def compare_dist_methods():
r, c = 10 ** 5, 10 ** 0
a, b = cp.random.uniform(-1000, 1000, (r, c)).astype( 'f'), cp.random.uniform(-1000, 1000, (r, c)).astype('f')
print(a.nbytes*1e-6) # 0.4 -->> GB <<--- ???!!!!!
cp.cuda.Stream.null.synchronize() # code finishes executing on GPU
d_gpu, ix_gpu = dist_gpu(a, b)
if __name__ == "__main__":
compare_dist_methods()
Я получаю следующую ошибку:
Traceback (most recent call last):
File "distance_err.py", line 44, in <module>
compare_dist_methods()
File "distance_err.py", line 38, in compare_dist_methods
d_gpu, ix_gpu = dist_gpu(rf, qu)
File "distance_err.py", line 27, in dist_gpu
d = cp.linalg.norm(Ref[:, None, :] - Query[None, :, :], axis=2)
File "cupy/core/core.pyx", line 934, in cupy.core.core.ndarray.__sub__
File "cupy/core/_kernel.pyx", line 836, in cupy.core._kernel.ufunc.__call__
File "cupy/core/_kernel.pyx", line 340, in cupy.core._kernel._get_out_args
File "cupy/core/core.pyx", line 134, in cupy.core.core.ndarray.__init__
File "cupy/cuda/memory.pyx", line 518, in cupy.cuda.memory.alloc
File "cupy/cuda/memory.pyx", line 1085, in cupy.cuda.memory.MemoryPool.malloc
File "cupy/cuda/memory.pyx", line 1106, in cupy.cuda.memory.MemoryPool.malloc
File "cupy/cuda/memory.pyx", line 934, in cupy.cuda.memory.SingleDeviceMemoryPool.malloc
File "cupy/cuda/memory.pyx", line 949, in cupy.cuda.memory.SingleDeviceMemoryPool._malloc
File "cupy/cuda/memory.pyx", line 697, in cupy.cuda.memory._try_malloc
cupy.cuda.memory.OutOfMemoryError: out of memory to allocate 40000000000 bytes (total 40000800768 bytes)
и $ nvidia-smi
:
--------------------------------------------------------------------------------
Wed Apr 1 20:35:55 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64.00 Driver Version: 440.64.00 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... On | 00000000:03:00.0 Off | 0 |
| N/A 36C P0 25W / 250W | 0MiB / 16280MiB | 0% E. Process |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P100-PCIE... On | 00000000:83:00.0 Off | 0 |
| N/A 34C P0 27W / 250W | 0MiB / 16280MiB | 0% E. Process |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Вопросы:
Является ли print(a.nbytes*1e-6)
правильным способом подсчитать, сколько памяти GPU я должен ожидать, чтобы моя машина выполняла это код успешно? В этом случае я получаю 0.4
, который, я полагаю, находится в GB
?
Чем отличаются 40000000000
и 40000800768
?
Есть ли еще один эффективный способ / библиотека в python Я мог бы выполнить sh такой огромный расчет?