Как исправить ошибку сегментации (сбрасывается ядро) из 2 тестов InternalThreadTest при запуске прогона SegNet (модифицированной версии l caffe)? - PullRequest
0 голосов
/ 30 апреля 2019

Я хочу успешно импортировать caffe в python 3.x (чтобы изучить архитектуру глубокого сверточного кодера-декодера SegNet-A для надежной семантической пиксельной маркировки, которая является модифицированной версией caffe).

Подробности настройки:

https://github.com/navganti/caffe-segnet-cudnn7

OS: Ubuntu 18.04

Graphics Card(only one): Nvidia Geforce Mx150

Cuda versions : 10.1
Cuda toolkit : 10.0
cudNN : 7.5

use of Anaconda3 : Yes

Python: 3.6.7 - I have compiled opencv 3.4.4 and boost 1.65.1 here due to the problem that the opencv in Anaconda did not work for compiling.

Python in Anaconda
-py36_machine(python 3.6.8), where I have installed required libraries such as OpenCV, numpy)
-base(python 3.6.8)

gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0

  • Я установил все необходимые зависимости и необходимые библиотеки, показанные в любых руководствах.
  • Я уже скомпилировал его без единой ошибки.
  • Когда я использую cmake, я могу даже пройти тестирование.Тем не менее, разработчик SegNet отметил, что cmake build пока не поддерживает сборку оболочки Python.Поэтому я должен использовать make для их сборки.

Вот мой Makefile.config


    ## Refer to http://caffe.berkeleyvision.org/installation.html
    # Contributions simplifying and improving our build system are welcome!

    # cuDNN acceleration switch (uncomment to build with cuDNN).
    USE_CUDNN := 1

    # CPU-only switch (uncomment to build without GPU support).
    # CPU_ONLY := 1

    # uncomment to disable IO dependencies and corresponding data layers
    USE_OPENCV := 0
    # USE_LEVELDB := 0
    # USE_LMDB := 0

    # uncomment to allow MDB_NOLOCK when reading LMDB files (only if necessary)
    #   You should not set this flag if you will be reading LMDBs with any
    #   possibility of simultaneous read and write
    # ALLOW_LMDB_NOLOCK := 1

    # Uncomment if you're using OpenCV 3
    OPENCV_VERSION := 3

    # To customize your choice of compiler, uncomment and set the following.
    # N.B. the default for Linux is g++ and the default for OSX is clang++
    # CUSTOM_CXX := g++

    # CUDA directory contains bin/ and lib/ directories that we need.
    CUDA_DIR := /usr/local/cuda-10.0
    # On Ubuntu 14.04, if cuda tools are installed via
    # "sudo apt-get install nvidia-cuda-toolkit" then use this instead:
    # CUDA_DIR := /usr

    # CUDA architecture setting: going with all of them.
    # For CUDA < 6.0, comment the *_50 lines for compatibility.
    # For CUDA < 8.0, comment the *_60 and *_61 lines for compatibility.
    # For CUDA >= 9.0, comment the *_20 and *_21 lines for compatibility.
    CUDA_ARCH :=
    # -gencode arch=compute_20,code=sm_20 \
    #       -gencode arch=compute_20,code=sm_21 \
        -gencode arch=compute_30,code=sm_30 \
        -gencode arch=compute_35,code=sm_35 \
        -gencode arch=compute_50,code=sm_50 \
        -gencode arch=compute_50,code=compute_50

    # BLAS choice:
    # atlas for ATLAS (default)
    # mkl for MKL
    # open for OpenBlas
    BLAS := atlas
    # Custom (MKL/ATLAS/OpenBLAS) include and lib directories.
    # Leave commented to accept the defaults for your choice of BLAS
    # (which should work)!
    # BLAS_INCLUDE := /path/to/your/blas
    # BLAS_LIB := /path/to/your/blas

    # Homebrew puts openblas in a directory that is not on the standard search path
    # BLAS_INCLUDE := $(shell brew --prefix openblas)/include
    # BLAS_LIB := $(shell brew --prefix openblas)/lib

    # This is required only if you will compile the matlab interface.
    # MATLAB directory should contain the mex binary in /bin.
    # MATLAB_DIR := /usr/local
    # MATLAB_DIR := /Applications/MATLAB_R2012b.app

    # NOTE: this is required only if you will compile the python interface.
    # We need to be able to find Python.h and numpy/arrayobject.h.
    #PYTHON_INCLUDE := /usr/include/python2.7 \
    #       /usr/lib/python2.7/dist-packages/numpy/core/include
    # Anaconda Python distribution is quite popular. Include path:
    # Verify anaconda location, sometimes it's in root.
    ANACONDA_HOME := $(ANACONDA_HOME)
    PYTHON_INCLUDE := $(ANACONDA_HOME)/include \
                $(ANACONDA_HOME)/envs/py36_machine/include/python3.6m \
                $(ANACONDA_HOME)/envs/py36_machine/lib/python3.6/site-    packages/numpy/core/include/numpy

    # error fix
    #LDFLAGS += -Wl,-rpath,$(ANACONDA_HOME)/envs/py36_machine/lib

    # Uncomment to use Python 3 (default is Python 2)
    PYTHON_LIBRARIES :=boost_python-py36 python3.6m
    PYTHON_INCLUDE :=/usr/include/python3.6m \
                 /usr/lib/python3.6/dist-packages/numpy/core/include

    # We need to be able to find libpythonX.X.so or .dylib.
    #PYTHON_LIB :=/usr/lib /usr/lib/x86_64-linux-gnu
    PYTHON_LIB := $(ANACONDA_HOME)/lib


    # Homebrew installs numpy in a non standard path (keg only)
    # PYTHON_INCLUDE += $(dir $(shell python -c 'import numpy.core;     print(numpy.core.__file__)'))/include
    # PYTHON_LIB += $(shell brew --prefix numpy)/lib

    # Uncomment to support layers written in Python (will link against Python libs)
    WITH_PYTHON_LAYER := 1

    # Whatever else you find you need goes here.
    INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include \
                $(ANACONDA_HOME)/envs/py36_machine/include     \
                /usr/include/hdf5/serial/ \
                /usr/include/opencv  \
                /usr/share/opencv
    LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib  \
                /usr/lib \
                /usr/lib/x86_64-linux-gnu \
                /usr/lib/x86_64-linux-gnu/hdf5/serial \
                $(ANACONDA_HOME)/envs/py36_machine/lib \
                /home/decuple/opencv/opencv-3.4.4/build/lib     \
                $(ANACONDA_HOME)/envs/py36_machine/share    /OpenCV/3rdparty/lib \
                /usr/share/opencv

    # If Homebrew is installed at a non standard location (for example your     home directory) and you use it for general dependencies
    # INCLUDE_DIRS += $(shell brew --prefix)/include
    # LIBRARY_DIRS += $(shell brew --prefix)/lib

    # Uncomment to use `pkg-config` to specify OpenCV library paths.
    # (Usually not necessary -- OpenCV libraries are normally installed in     one of the above $LIBRARY_DIRS.)
    USE_PKG_CONFIG := 1

    # N.B. both build and distribute dirs are cleared on `make clean`
    BUILD_DIR := build
    DISTRIBUTE_DIR := distribute

    # Uncomment for debugging. Does not work on OSX due to https://github.com/BVLC/caffe/issues/171
    # DEBUG := 1

    # The ID of the GPU that 'make runtest' will use to run unit tests.
    TEST_GPUID := 0

    # enable pretty build (comment to see full commands)
    Q ?= @

Кроме того, здесь вывод cmake, хотя я не буду его использовать.

    -- The C compiler identification is GNU 7.3.0
    -- The CXX compiler identification is GNU 7.3.0
    -- Check for working C compiler: /usr/bin/cc
    -- Check for working C compiler: /usr/bin/cc -- works
    -- Detecting C compiler ABI info   
    -- Detecting C compiler ABI info - done
    -- Detecting C compile features
    -- Detecting C compile features - done
    -- Check for working CXX compiler: /usr/bin/c++
    -- Check for working CXX compiler: /usr/bin/c++ -- works
    -- Detecting CXX compiler ABI info
    -- Detecting CXX compiler ABI info - done
    -- Detecting CXX compile features
    -- Detecting CXX compile features - done
    -- Using 'Release' build type as CMAKE_BUILD_TYPE is not set
    CMake Warning (dev) at cmake/Misc.cmake:27 (set):
      implicitly converting 'BOOLEAN' to 'STRING' type.
    Call Stack (most recent call first):
      CMakeLists.txt:29 (include)
    This warning is for project developers.  Use -Wno-dev to suppress it.

    -- Looking for pthread.h
    -- Looking for pthread.h - found
    -- Looking for pthread_create
    -- Looking for pthread_create - not found
    -- Looking for pthread_create in pthreads
    -- Looking for pthread_create in pthreads - not found
    -- Looking for pthread_create in pthread
    -- Looking for pthread_create in pthread - found
    -- Found Threads: TRUE      
    -- Boost version: 1.65.1
    -- Found the following Boost libraries:
    --   system
    --   thread
    --   filesystem
    --   chrono
    --   date_time
    --   atomic
    -- Found GFlags: /usr/include  
    -- Found gflags  (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libgflags.so)
    -- Found Glog: /usr/include  
    -- Found glog    (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libglog.so)
    -- Found Protobuf: /usr/lib/x86_64-linux-gnu/libprotobuf.so;-lpthread         (found version "3.6.1") 
    -- Found PROTOBUF Compiler: /usr/bin/protoc
    -- HDF5: Using hdf5 compiler wrapper to determine C configuration
    -- HDF5: Using hdf5 compiler wrapper to determine CXX configuration
    -- Found HDF5: /usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5_cpp.so;    /usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so;/usr/lib/x86_64-linux-gnu/libpthread.so;/usr/lib/x86_64-linux-gnu/libsz.so;/usr/lib/x86_64-linux-gnu/libz.so;/usr/lib/x86_64-linux-gnu/libdl.so;/usr/lib/x86_64-linux-gnu/libm.so (found version "1.10.0.1") found components:  HL 
    -- Found LMDB: /usr/include  
    -- Found lmdb    (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/liblmdb.so)
    -- Found LevelDB: /usr/include  
    -- Found LevelDB (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libleveldb.so)
    -- Found Snappy: /usr/include  
    -- Found Snappy  (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libsnappy.so)
    -- CUDA detected: 10.0
    -- Found cuDNN: ver. 7.5.0 found (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libcudnn.so)
    -- Added CUDA NVCC flags for: sm_61
    -- OpenCV found (/usr/local/share/OpenCV)
    -- Found Atlas: /usr/include/x86_64-linux-gnu  
    -- Found Atlas (include: /usr/include/x86_64-linux-gnu, library: /usr/lib/x86_64-linux-gnu/libatlas.so)
    -- Could NOT find Doxygen (missing: DOXYGEN_EXECUTABLE) 
    -- Python interface is disabled or not all required dependencies found. Building without it...
    -- Found Git: /usr/bin/git (found version "2.17.1") 
    -- 
    -- ******************* Caffe Configuration Summary *******************
    -- General:
    --   Version           :   1.0.0-rc3
    --   Git               :   unknown
    --   System            :   Linux
    --   C++ compiler      :   /usr/bin/c++
    --   Release CXX flags :   -O3 -DNDEBUG -fPIC -Wall -Wno-sign-compare -Wno-uninitialized
    --   Debug CXX flags   :   -g -fPIC -Wall -Wno-sign-compare -Wno-    uninitialized
    --   Build type        :   Release
    -- 
    --   BUILD_SHARED_LIBS :   ON
    --   BUILD_python      :   OFF
    --   BUILD_matlab      :   OFF
    --   BUILD_docs        :   ON
    --   CPU_ONLY          :   OFF
    --   USE_OPENCV        :   ON
    --   USE_LEVELDB       :   ON
    --   USE_LMDB          :   ON
    --   ALLOW_LMDB_NOLOCK :   OFF
    -- 
    -- Dependencies:
    --   BLAS              :   Yes (Atlas)
    --   Boost             :   Yes (ver. 1.65)
    --   glog              :   Yes
    --   gflags            :   Yes
    --   protobuf          :   Yes (ver. 3.6.1)
    --   lmdb              :   Yes (ver. 0.9.21)
    --   LevelDB           :   Yes (ver. 1.20)
    --   Snappy            :   Yes (ver. ..)
    --   OpenCV            :   Yes (ver. 3.4.4)
    --   CUDA              :   Yes (ver. 10.0)
    -- 
    -- NVIDIA CUDA:
    --   Target GPU(s)     :   Auto
    --   GPU arch(s)       :   sm_61
    --   cuDNN             :   Yes (ver. 7.5.0)
    -- 
    -- Documentation:
    --   Doxygen           :   No
    --   config_file       :   
    -- 
    -- Install:
    --   Install path      :   /usr/local
    -- 
    -- Configuring done
    -- Generating done
    -- Build files have been written to: /SegNet/caffe-segnet/build

Пройдя несколько ошибок, я успешно скомпилировал его, а также сделал тест успешным.Тем не менее, когда я попытался сделать runtest, возникла проблема.

    [----------] 2 tests from InternalThreadTest
    [ RUN      ] InternalThreadTest.TestRandomSeed
    *** Aborted at 1556589426 (unix time) try "date -d @1556589426" if you are using GNU date ***
    PC: @     0x7f61aaf6f5af __pthread_cond_broadcast
    *** SIGSEGV (@0x10000002b) received by PID 2941 (TID 0x7f61b1c50c80) from     
    PID 43; stack trace: ***
    @     0x7f61aaf73890 (unknown)
    @     0x7f61aaf6f5af __pthread_cond_broadcast
    @     0x7f61ac2308ca boost::thread::interrupt()
    @     0x7f61ab8c3fc9 caffe::InternalThread::StopInternalThread()
    @     0x5574bd65f99f     caffe::InternalThreadTest_TestRandomSeed_Test::TestBody()
    @     0x5574bd686a2a testing::internal::HandleExceptionsInMethodIfSupported<>()
    @     0x5574bd67fb3a testing::Test::Run()
    @     0x5574bd67fc1c testing::TestInfo::Run()
    @     0x5574bd67fd55 testing::TestCase::Run()
    @     0x5574bd680210 testing::internal::UnitTestImpl::RunAllTests()
    @     0x5574bd680357 testing::UnitTest::Run()
    @     0x5574bd234d11 main
    @     0x7f61aab91b97 __libc_start_main
    @     0x5574bd23ba9a _start
    Makefile:528: recipe for target 'runtest' failed
    make: *** [runtest] Segmentation fault (core dumped)

Обычно ошибка сегментации может быть вызвана множеством графических процессоров, но у меня есть только один.Итак, попробовал следующее, но ничего не получилось:

1. export CUDA_VISIBLE_DEVICES=0
This did not work.

2. Re-installing & compiling OpenCV

Более того, я не знаю, откуда эта ошибка.Я исследовал несколько часов, и кажется, что нет никакого отличного решения, кроме первого, которое я уже попробовал, и кажется, что оно не работает на моем.

Единственное, что я смог найти, - это решение на основе хип-компилятора.Тем не менее, так как мой ноутбук действительно подходит с amd, я не думаю, что это было полезно для меня.Однако, поскольку он исправил проблему, я бы оставил здесь ссылку.https://github.com/ROCmSoftwarePlatform/hipCaffe/issues/8

Не могли бы вы дать мне решение?

...