Не удалось заставить его работать, и документация немного сложна для чтения.
Попробовал ниже и вижу выходные данные как н / д.
root@teja:~/Projs/CUDA/05-Profiling# nv-nsight-cu-cli --device 0 --metrics gst_throughput,gld_throughput ./run 0
==PROF== Connected to process 28170 (/root/Projs/CUDA/05-Profiling/run)
==PROF== Profiling "Init" - 1: 0%....50%....100% - 1 pass
==PROF== Profiling "Transpose_rowRead_colWrite" - 2: 0%....50%....100% - 1 pass
==PROF== Disconnected from process 28170
[28170] run@127.0.0.1
Init(mat<int>,mat<int>), 2020-May-01 14:35:43, Context 1, Stream 7
Section: Command line profiler metrics
---------------------------------------------------------------------- --------------- ------------------------------
gld_throughput (!) n/a
gst_throughput (!) n/a
---------------------------------------------------------------------- --------------- ------------------------------
Transpose_rowRead_colWrite(mat<int>,mat<int>), 2020-May-01 14:35:43, Context 1, Stream 7
Section: Command line profiler metrics
---------------------------------------------------------------------- --------------- ------------------------------
gld_throughput (!) n/a
gst_throughput (!) n/a
---------------------------------------------------------------------- --------------- ------------------------------