Я пытаюсь использовать tenorflow-gpu 2.1.0, установленный через pip.
Проблема: Диспетчер задач на windows10 показывает практически нулевое использование графического процессора. Использование от 2% до 5%. Но баран используется почти на 100%. Что может быть причиной того, что tasker-manager показывает, что графический процессор (GTX 1660 Ti) не используется?
С nvidia-smi Я получаю другое изображение:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 445.87 Driver Version: 445.87 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 166... WDDM | 00000000:10:00.0 On | N/A |
| 79% 64C P2 109W / 130W | 5964MiB / 6144MiB | 89% Default |
Я использую CUDA 10.1
Предупреждения для Tensorflow:
2020-04-16 21:07:55.541837: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-04-16 21:07:58.416796: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-04-16 21:07:58.450054: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:10:00.0 name: GeForce GTX 1660 Ti computeCapability: 7.5
coreClock: 1.845GHz coreCount: 24 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 268.26GiB/s
2020-04-16 21:07:58.450406: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-04-16 21:07:58.455452: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-04-16 21:07:58.459642: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-04-16 21:07:58.461515: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-04-16 21:07:58.466455: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-04-16 21:07:58.469085: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-04-16 21:07:58.479479: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-16 21:07:58.480206: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-04-16 21:07:58.480629: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-04-16 21:07:58.482300: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:10:00.0 name: GeForce GTX 1660 Ti computeCapability: 7.5
coreClock: 1.845GHz coreCount: 24 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 268.26GiB/s
2020-04-16 21:07:58.482677: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-04-16 21:07:58.482875: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-04-16 21:07:58.483040: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-04-16 21:07:58.483203: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-04-16 21:07:58.483355: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-04-16 21:07:58.483529: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-04-16 21:07:58.483712: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-16 21:07:58.484448: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-04-16 21:07:59.249742: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-16 21:07:59.250043: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] 0
2020-04-16 21:07:59.250203: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0: N
2020-04-16 21:07:59.251187: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4625 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1660 Ti, pci bus id: 0000:10:00.0, compute capability: 7.5)
Found 5338 images belonging to 4 classes.
Found 3554 images belonging to 4 classes.
WARNING:tensorflow:sample_weight modes were coerced from
...
to
['...']
WARNING:tensorflow:sample_weight modes were coerced from
...
to
['...']
2020-04-16 21:08:11.464027: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-04-16 21:08:12.081246: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-16 21:08:13.727563: W tensorflow/stream_executor/gpu/redzone_allocator.cc:312] Internal: Invoking GPU asm compilation is supported on Cuda non-Windows platforms only
Relying on driver to perform ptx compilation. This message will be only logged once.
2020-04-16 21:08:15.806688: I tensorflow/core/profiler/lib/profiler_session.cc:225] Profiler session started.
2020-04-16 21:08:15.806850: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1259] Profiler found 1 GPUs
2020-04-16 21:08:15.808769: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cupti64_101.dll
2020-04-16 21:08:15.909368: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1307] function cupti_interface_->Subscribe( &subscriber_, (CUpti_CallbackFunc)ApiCallback, this)failed with error CUPTI_ERROR_INSUFFICIENT_PRIVILEGES
2020-04-16 21:08:15.910677: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1346] function cupti_interface_->ActivityRegisterCallbacks( AllocCuptiActivityBuffer, FreeCuptiActivityBuffer)failed with error CUPTI_ERROR_INSUFFICIENT_PRIVILEGES
2020-04-16 21:08:16.092575: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1329] function cupti_interface_->EnableCallback( 0 , subscriber_, CUPTI_CB_DOMAIN_DRIVER_API, cbid)failed with error CUPTI_ERROR_INVALID_PARAMETER
2020-04-16 21:08:16.092946: I tensorflow/core/profiler/internal/gpu/device_tracer.cc:88] GpuTracer has collected 0 callback api events and 0 activity events.
WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.338369). Check your callbacks.
Я хочу выделить ошибку: CUPTI_ERROR_INSUFFICIENT_PRIVILEGES
Текущий недостаточный код:
import argparse
from datetime import datetime
import itertools
from six.moves import range
import io
import matplotlib.pyplot as plt
import numpy as np
import sklearn.metrics
import tensorflow as tf
from tensorflow.keras import applications
from tensorflow.keras.callbacks import TensorBoard, ReduceLROnPlateau, ModelCheckpoint, EarlyStopping, LambdaCallback
from tensorflow.keras.layers import Dense, Dropout, GlobalAveragePooling2D
from tensorflow.keras.models import Model, load_model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.preprocessing.image import ImageDataGenerator
def plot_confusion_matrix(cm, class_names):
"""
Returns a matplotlib figure containing the plotted confusion matrix.
Args:
cm (array, shape = [n, n]): a confusion matrix of integer classes
class_names (array, shape = [n]): String names of the integer classes
"""
figure = plt.figure(figsize=(8, 8))
plt.imshow(cm, interpolation='nearest', cmap=plt.cm.Blues)
plt.title("Confusion matrix")
plt.colorbar()
tick_marks = np.arange(len(class_names))
plt.xticks(tick_marks, class_names, rotation=45)
plt.yticks(tick_marks, class_names)
# Normalize the confusion matrix.
cm = np.around(cm.astype('float') / cm.sum(axis=1)[:, np.newaxis], decimals=2)
# Use white text if squares are dark; otherwise black.
threshold = cm.max() / 2.
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
color = "white" if cm[i, j] > threshold else "black"
plt.text(j, i, cm[i, j], horizontalalignment="center", color=color)
plt.tight_layout()
plt.ylabel('True label')
plt.xlabel('Predicted label')
return figure
def create_resnet50(img_h: int, img_w: int, num_classes: int):
# define our MLP network
base_model = applications.resnet50.ResNet50(weights=None, include_top=False, input_shape=(img_h, img_w, 3))
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dropout(rate=0.3)(x)
predictions = Dense(num_classes, activation='softmax')(x)
mdl = Model(inputs=base_model.input, outputs=predictions)
return mdl
def plot_to_image(figure):
"""Converts the matplotlib plot specified by 'figure' to a PNG image and
returns it. The supplied figure is closed and inaccessible after this call."""
# Save the plot to a PNG in memory.
buf = io.BytesIO()
plt.savefig(buf, format='png')
# Closing the figure prevents it from being displayed directly inside
# the notebook.
plt.close(figure)
buf.seek(0)
# Convert PNG buffer to TF image
image = tf.image.decode_png(buf.getvalue(), channels=4)
# Add the batch dimension
image = tf.expand_dims(image, 0)
return image
def log_confusion_matrix(epoch, logs):
# Use the model to predict the values from the validation dataset.
# create list of 256 images, labels
itx = 256 // bch_size
test_images, test_labels_raw = [], []
for i in range(itx):
tmp_img, tmp_lbs = next(val_gen)
test_images.extend(tmp_img)
test_labels_raw.extend(tmp_lbs)
test_pred_raw = model.predict(np.array(test_images))
test_pred = np.argmax(test_pred_raw, axis=1)
test_labels = np.argmax(test_labels_raw, axis=1)
# Calculate the confusion matrix.
cm = sklearn.metrics.confusion_matrix(test_labels, test_pred)
# Log the confusion matrix as an image summary.
figure = plot_confusion_matrix(cm, class_names=[x for x in val_gen.class_indices.values()])
cm_image = plot_to_image(figure)
# Log the confusion matrix as an image summary.
with file_writer_cm.as_default():
tf.summary.image("Confusion Matrix", cm_image, step=epoch)
def run(train_generator, test_generator, epcs: int, mdl: Model, opt):
# train the model
mdl.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy', 'mse', ])
stopper = EarlyStopping(monitor='val_loss', patience=min(epcs / 16, 10), mode='auto',
restore_best_weights=True)
checker = ModelCheckpoint(monitor='val_loss', filepath='weights.{epoch:03d}.hdf5',
save_best_only=True, save_freq='epoch')
shower = TensorBoard(histogram_freq=1)
reducer = ReduceLROnPlateau(factor=0.6, patience=10, min_delta=1e-4, cooldown=10)
cm_callback = LambdaCallback(on_epoch_end=log_confusion_matrix)
history = model.fit(train_generator, epochs=epcs, verbose=0,
validation_data=test_generator,
callbacks=[stopper, checker, shower, reducer, cm_callback]
)
return history
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument('train_path', type=str, help='Path to the train main folder of files.')
parser.add_argument('test_path', type=str, help='Path to the test main folder of files.')
parser.add_argument('new_model', type=bool, help='Create new model, or load from file.')
parser.add_argument('-m', '--model_path', type=str, help='path to model.')
args = parser.parse_args()
train_p = args.train_path
test_p = args.test_path
is_new = args.new_model
model_path = args.model_path
img_height, img_width = 214, 214
file_writer_cm = tf.summary.create_file_writer('logs/cm')
model = create_resnet50(img_height, img_width, num_classes=4) if is_new else load_model(model_path)
adam = Adam(lr=0.0001)
train_datagen = ImageDataGenerator(
rescale=1. / 255,
horizontal_flip=True,
vertical_flip=True,
rotation_range=90,
width_shift_range=0.1,
height_shift_range=0.1,
zoom_range=0.2
)
validation_datagen = ImageDataGenerator(
rescale=1.255
)
bch_size = 16
train_gen = train_datagen.flow_from_directory(directory=train_p, target_size=(img_height, img_width),
batch_size=bch_size)
val_gen = validation_datagen.flow_from_directory(directory=test_p, target_size=(img_height, img_width),
batch_size=bch_size)
h = run(train_gen, val_gen, 100, model, adam)
m_name = 'Model_resnet50_epoch{}_score{:3.2f}.hdf5'.format(100, min(h.history['val_loss']))
model.save(m_name)
Я действительно хочу заранее поблагодарить вас. Я действительно ценю это!