Евклидово расстояние с использованием NumPy - PullRequest
0 голосов
/ 23 октября 2019

Я пытаюсь вычислить евклидово расстояние двух двоичных данных (изображение), используя numpy, но получаю nan в результате

def eculideanDistance(features, predict, dist):
    dist += (float(features[0]) - float(predict[0]))
    return math.sqrt(dist)

Вывод

Я использую эти двоичные данные

train_set = {
    0: [
        ["0000000000000111100000000000000000000000000011111110000000000000000000000011111111110000000000000000000111111111111110000000000000000001111111011111100000000000000000111111100000111100000000000000001111111000000011100000000000000011111110000000111100000000000000111111100000000111000000000000001111111000000001110000000000000011111100000000011110000000000000111111000000000011100000000000001111110000000000111000000000000001111110000000000111000000000000011111100000000001110000000000000111111000000000011100000000000001111110000000000111000000000000111111100000000011110000000000001111011000000000111100000000000011110000000000011110000000000000011110000000000011110000000000000111100000000001111100000000000001111000000000111110000000000000011110000000011111000000000000000011100000011111100000000000000000111100011111110000000000000000001111111111111100000000000000000001111111111111000000000000000000011111111111100000000000000000000011111111100000000000000000000000011111000000000000000000000000000011000000000000000000"],
        ["0000000000011111000000000000000000000000001111111000000000000000000000000111111111000000000000000000000011111111111000000000000000000001111111111111000000000000000000111111101111111000000000000000001111110001111111000000000000000011111100001111110000000000000001111111000001111110000000000000011111110000001111100000000000000011111100000001111110000000000001111111000000001111110000000000011111100000000001111100000000000111111000000000011111100000000001111110000000000111111000000000011111100000000000111110000000000111111000000000001111100000000001111110000000000011111000000000011111100000000000111110000000000111111000000000001111100000000000111110000000000011111000000000001111110000000001111110000000000011111100000000111111000000000000011111100000001111111000000000000011111000000111111100000000000000111110000011111110000000000000001111110001111111000000000000000011111111111111100000000000000000111111111111110000000000000000000111111111111000000000000000000000111111111100000000000000000000000111111110000000000000"]
    ],
    1: [
        ["0000000000000000111100000000000000000000000000011111111000000000000000000000000111111111000000000000000000000001111111111000000000000000000000011111111110000000000000000000001111111111000000000000000000000011111111110000000000000000000001111111111100000000000000000000001111111111000000000000000000000011111111110000000000000000000000111111111000000000000000000000011111111110000000000000000000001111111111100000000000000000000111111111110000000000000000000011111111111110000000000000000111111111111111100000000000000011111111111111110000000000000001111111111111111100000000000000011111111111111111000000000000000001111111111111110000000000000000011111110111111100000000000000000011110001111111000000000000000000000000011111110000000000000000000000000111111000000000000000000000000011111111000000000000000000000000011111110000000000000000000000000111111100000000000000000000000001111111100000000000000000000000011111111000000000000000000000000111111110000000000000000000000000011111111000000000000000000000000111111100000000"],
        ["0000000001111100000000000000000000000000001111100000000000000000000000000011111100000000000000000000000000111111100000000000000000000000001111111000000000000000000000000011111111000000000000000000000000111111110000000000000000000000001111111000000000000000000000000001111111000000000000000000000000111111110000000000000000000000001111111110000000000000000000000011111111100000000000000000000001111111110000000000000000000000011111111110000000000000000000001111111111000000000000000000000001111111111000000000000000000000011111111110000000000000000000000111111111100000000000000000000000111111111100000000000000000000000111111111000000000000000000000000001111111000000000000000000000000011111110000000000000000000000000111111100000000000000000000000000111111100000000000000000000000001111111000000000000000000000000001111111000000000000000000001111111111111111111000000000000111111111111111111111000000000001111111111111111111111000000000011111111111111111111110000000000001111111111111111111100000000000001111111111111111111"],
    ]
}

test_set = ["0000000000000000011000000000000000000000000000011111111000000000000000000000011111111111000000000000000000000011111111111000000000000000000001111111111110000000000000000000011111111111100000000000000000011111111111110000000000000000000111111111111100000000000000000001111111111111000000000000000000111111111111110000000000000000111111111111111100000000000000001111111111111111000000000000000001111111111111111000000000000001111111111111111110000000000000111111111111111111100000000000001111111111111111111000000000000001111111111111111110000000000000000010000111111111100000000000000000000001111111110000000000000000000000011111111100000000000000000000000111111111000000000000000000000000111111111000000000000000000000011111111110000000000000000000000111111111100000000000000000000001111111111000000000000000000000011111111110000000000000000000000111111111100000000000000000000001111111111000000000000000000000011111111110000000000000000000011111111111100000000000000000000000111111111100000000000000000000000111111000000000"]

Ответы [ 2 ]

0 голосов
/ 24 октября 2019

Это не двоичные данные. Это двоичное изображение, хранящееся в виде строки, где пиксели представлены либо 0 (черный), либо 1 (белый).

Чтобы упростить задачу, давайте конвертируем ваши данные в 32 x 32numpy array и визуализируйте его.

Преобразование train_set в numpy array

train_img = {label: [np.uint8([*sample[0]]).reshape(32, 32) 
    for sample in samples] 
        for label, samples in train_set.items()}

enter image description here

Преобразование test_set вnumpy array

test_img = np.uint8([*test_set[0]]).reshape(32, 32)

enter image description here

С этой точки зрения вычисление евклидова расстояния с использованием numpy довольно просто с использованием numpy.linalg.norm. Например:

In [5]: np.linalg.norm(test_img - train_img[0][0])

Out[5]: 2984.7336564591487


In [6]: np.linalg.norm(test_img - train_img[0][1])

Out[6]: 3459.016189612301


In [7]: np.linalg.norm(test_img - train_img[1][0])

Out[7]: 1691.5064291926294


In [8]: np.linalg.norm(test_img - train_img[1][1])

Out[8]: 2650.0669802855928

Полный код этого ответа

In [1]: import numpy as np


In [2]: train_set = {

   ...:     0: [

   ...:         ["0000000000000111100000000000000000000000000011111110000000000000000000000011111111110000000000000000000111111111111110000000000000000001111111011111100000000000000000111111100000111100000000000000001111111000000011100000000000000011111110000000111100000000000000111111100000000111000000000000001111111000000001110000000000000011111100000000011110000000000000111111000000000011100000000000001111110000000000111000000000000001111110000000000111000000000000011111100000000001110000000000000111111000000000011100000000000001111110000000000111000000000000111111100000000011110000000000001111011000000000111100000000000011110000000000011110000000000000011110000000000011110000000000000111100000000001111100000000000001111000000000111110000000000000011110000000011111000000000000000011100000011111100000000000000000111100011111110000000000000000001111111111111100000000000000000001111111111111000000000000000000011111111111100000000000000000000011111111100000000000000000000000011111000000000000000000000000000011000000000000000000"],

   ...:         ["0000000000011111000000000000000000000000001111111000000000000000000000000111111111000000000000000000000011111111111000000000000000000001111111111111000000000000000000111111101111111000000000000000001111110001111111000000000000000011111100001111110000000000000001111111000001111110000000000000011111110000001111100000000000000011111100000001111110000000000001111111000000001111110000000000011111100000000001111100000000000111111000000000011111100000000001111110000000000111111000000000011111100000000000111110000000000111111000000000001111100000000001111110000000000011111000000000011111100000000000111110000000000111111000000000001111100000000000111110000000000011111000000000001111110000000001111110000000000011111100000000111111000000000000011111100000001111111000000000000011111000000111111100000000000000111110000011111110000000000000001111110001111111000000000000000011111111111111100000000000000000111111111111110000000000000000000111111111111000000000000000000000111111111100000000000000000000000111111110000000000000"]

   ...:     ],

   ...:     1: [

   ...:         ["0000000000000000111100000000000000000000000000011111111000000000000000000000000111111111000000000000000000000001111111111000000000000000000000011111111110000000000000000000001111111111000000000000000000000011111111110000000000000000000001111111111100000000000000000000001111111111000000000000000000000011111111110000000000000000000000111111111000000000000000000000011111111110000000000000000000001111111111100000000000000000000111111111110000000000000000000011111111111110000000000000000111111111111111100000000000000011111111111111110000000000000001111111111111111100000000000000011111111111111111000000000000000001111111111111110000000000000000011111110111111100000000000000000011110001111111000000000000000000000000011111110000000000000000000000000111111000000000000000000000000011111111000000000000000000000000011111110000000000000000000000000111111100000000000000000000000001111111100000000000000000000000011111111000000000000000000000000111111110000000000000000000000000011111111000000000000000000000000111111100000000"],

   ...:         ["0000000001111100000000000000000000000000001111100000000000000000000000000011111100000000000000000000000000111111100000000000000000000000001111111000000000000000000000000011111111000000000000000000000000111111110000000000000000000000001111111000000000000000000000000001111111000000000000000000000000111111110000000000000000000000001111111110000000000000000000000011111111100000000000000000000001111111110000000000000000000000011111111110000000000000000000001111111111000000000000000000000001111111111000000000000000000000011111111110000000000000000000000111111111100000000000000000000000111111111100000000000000000000000111111111000000000000000000000000001111111000000000000000000000000011111110000000000000000000000000111111100000000000000000000000000111111100000000000000000000000001111111000000000000000000000000001111111000000000000000000001111111111111111111000000000000111111111111111111111000000000001111111111111111111111000000000011111111111111111111110000000000001111111111111111111100000000000001111111111111111111"],

   ...:     ]

   ...: }

   ...: 

   ...: test_set = ["0000000000000000011000000000000000000000000000011111111000000000000000000000011111111111000000000000000000000011111111111000000000000000000001111111111110000000000000000000011111111111100000000000000000011111111111110000000000000000000111111111111100000000000000000001111111111111000000000000000000111111111111110000000000000000111111111111111100000000000000001111111111111111000000000000000001111111111111111000000000000001111111111111111110000000000000111111111111111111100000000000001111111111111111111000000000000001111111111111111110000000000000000010000111111111100000000000000000000001111111110000000000000000000000011111111100000000000000000000000111111111000000000000000000000000111111111000000000000000000000011111111110000000000000000000000111111111100000000000000000000001111111111000000000000000000000011111111110000000000000000000000111111111100000000000000000000001111111111000000000000000000000011111111110000000000000000000011111111111100000000000000000000000111111111100000000000000000000000111111000000000"]

   ...: 


In [3]: train_img = {label: [np.uint8([*sample[0]]).reshape(32, 32) 

   ...:     for sample in samples] 

   ...:         for label, samples in train_set.items()}


In [4]: test_img = np.uint8([*test_set[0]]).reshape(32, 32)


In [5]: np.linalg.norm(test_img - train_img[0][0])

Out[5]: 2984.7336564591487


In [6]: np.linalg.norm(test_img - train_img[0][1])

Out[6]: 3459.016189612301


In [7]: np.linalg.norm(test_img - train_img[1][0])

Out[7]: 1691.5064291926294


In [8]: np.linalg.norm(test_img - train_img[1][1])

Out[8]: 2650.0669802855928
0 голосов
/ 23 октября 2019

Формула, которую вы используете для евклидова расстояния, неверна. В итоге вы получите квадратный корень из отрицательных чисел, и поэтому вы получите NaN. Я думаю, что вы хотели сделать что-то вроде:

def euclideanDistance(features, predict, dist):
    diff = (float(features[0]) - float(predict[0]))
    dist += diff * diff 
    return math.sqrt(dist)

(я не уверен, почему вы всегда используете индекс 0 и почему переменная dist является параметром, а не только возвращаемым значением. Я подозреваю,с этим тоже может быть проблема, но у меня нет контекста, чтобы судить.)

Однако, если вместо кодирования вы вместо этого кодируете изображения в виде массивов Numpy, Numpy предлагает прямой способ вычисления евклидовой нормы, если вы кодируете:

a = numpy.array([0,0,1,1])
b = numpy.array([1,0,0,1])
euclidean_norm = numpy.linalg.norm(a-b)
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...