Сбой AMD VEGA64 на ядре> 4.15 - PullRequest
1 голос
/ 22 мая 2019

Таким образом, при попытке запустить Kernel 4.19.39, 5.0.13 и 5.1 они зависают через несколько секунд после запуска Steam или Overwatch (клиент BattleNet). В настоящее время работает 4.15, который работает нормально и стабильно.

Я сделал следующее:

  • GRUB_CMDLINE_LINUX_DEFAULT="splash idle=nomwait"
  • опция питания typical
  • Обновлен BIOS (с AGESA 1.0.0.4 до 1.0.0.6)
  • Обновленная ОС (Ubuntu 18.04)

Оборудование

AMD Ryzen 7 2700X Wraith Boxed
Asus Vega 64 Strix    
Gigabyte X470 AORUS ULTRA GAMING (AGESA 1.0.0.6)
G.Skill Ripjaws V 16GB DDR4 3200MHz (4 x 16GB)
Corsair CX850M 850W ATX power supply unit

screenfetch -n

OS: Ubuntu 18.04 bionic
 Kernel: x86_64 Linux 4.15.0-48-generic
 Uptime: 1h 29m
 Packages: 3497
 Shell: bash 4.4.19
 Resolution: 3840x2160
 DE: GNOME 
 WM: GNOME Shell
 WM Theme: Adwaita
 GTK Theme: Ambiance [GTK2/3]
 Icon Theme: ubuntu-mono-dark
 Font: Ubuntu 11
 CPU: AMD Ryzen 7 2700X Eight-Core @ 16x 3.7GHz [36.3°C]
 GPU: Radeon RX Vega (VEGA10, DRM 3.23.0, 4.15.0-48-generic, LLVM 9.0.0)
 RAM: 6208MiB / 64432MiB

Драйверы + дополнительная информация

~$ glxinfo | grep "OpenGL version"
OpenGL version string: 4.5 (Compatibility Profile) Mesa 19.2.0-devel - padoka PPA

~$ cat /etc/apt/sources.list.d/paulo-miguel-dias-ubuntu-mesa-bionic.list
deb http://ppa.launchpad.net/paulo-miguel-dias/mesa/ubuntu bionic main
# deb-src http://ppa.launchpad.net/paulo-miguel-dias/mesa/ubuntu bionic main

~$ sudo lspci -v | grep -i vga -A 10
0c:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XT [Radeon RX Vega 64] (rev c1) (prog-if 00 [VGA controller])
    Subsystem: ASUSTeK Computer Inc. Vega 10 XT [Radeon RX Vega 64]
    Flags: bus master, fast devsel, latency 0, IRQ 114
    Memory at e0000000 (64-bit, prefetchable) [size=256M]
    Memory at f0000000 (64-bit, prefetchable) [size=2M]
    I/O ports at e000 [size=256]
    Memory at fcc00000 (32-bit, non-prefetchable) [size=512K]
    Expansion ROM at 000c0000 [disabled] [size=128K]
    Capabilities: [48] Vendor Specific Information: Len=08 <?>
    Capabilities: [50] Power Management version 3
    Capabilities: 

    ...

~$ apt show libdrm-amdgpu1 -a
Package: libdrm-amdgpu1
Version: 2.4.98+git1905192304.922d929~b~padoka0
Priority: optional
Section: libs
Source: libdrm
Maintainer: Debian X Strike Force <debian-x@lists.debian.org>
Installed-Size: 76,8 kB
Depends: libc6 (>= 2.17), libdrm2 (>= 2.4.82)
Download-Size: 26,9 kB
APT-Manual-Installed: yes
APT-Sources: http://ppa.launchpad.net/paulo-miguel-dias/mesa/ubuntu bionic/main amd64 Packages
Description: Userspace interface to amdgpu-specific kernel DRM services -- runtime
 This library implements the userspace interface to the kernel DRM
 services.  DRM stands for "Direct Rendering Manager", which is the
 kernelspace portion of the "Direct Rendering Infrastructure" (DRI).
 The DRI is currently used on Linux to provide hardware-accelerated

Я обнаружил следующее в логах ядра при тестировании с Kernel 5.1

May 22 18:46:31 [HOST] kernel: [  256.354386] amdgpu 0000:0c:00.0: [gfxhub] no-retry page fault (src_id:0 ring:158 vmid:5 pasid:32780, for process Battle.net.exe pid 10384 thread Battle.net:cs0 pid 10575)
May 22 18:46:31 [HOST] kernel: [  256.354390] amdgpu 0000:0c:00.0:   in page starting at address 0x0000000000400000 from 27
May 22 18:46:31 [HOST] kernel: [  256.354391] amdgpu 0000:0c:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0050153D
May 22 18:46:31 [HOST] kernel: [  256.354395] amdgpu 0000:0c:00.0: [gfxhub] no-retry page fault (src_id:0 ring:158 vmid:5 pasid:32780, for process Battle.net.exe pid 10384 thread Battle.net:cs0 pid 10575)
May 22 18:46:31 [HOST] kernel: [  256.354397] amdgpu 0000:0c:00.0:   in page starting at address 0x0000000000400000 from 27
May 22 18:46:31 [HOST] kernel: [  256.354398] amdgpu 0000:0c:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
May 22 18:46:31 [HOST] kernel: [  256.354404] amdgpu 0000:0c:00.0: [gfxhub] no-retry page fault (src_id:0 ring:158 vmid:5 pasid:32780, for process Battle.net.exe pid 10384 thread Battle.net:cs0 pid 10575)
May 22 18:46:31 [HOST] kernel: [  256.354405] amdgpu 0000:0c:00.0:   in page starting at address 0x0000000000400000 from 27
May 22 18:46:31 [HOST] kernel: [  256.354407] amdgpu 0000:0c:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
May 22 18:46:31 [HOST] kernel: [  256.354411] amdgpu 0000:0c:00.0: [gfxhub] no-retry page fault (src_id:0 ring:158 vmid:5 pasid:32780, for process Battle.net.exe pid 10384 thread Battle.net:cs0 pid 10575)
May 22 18:46:31 [HOST] kernel: [  256.354412] amdgpu 0000:0c:00.0:   in page starting at address 0x0000000000400000 from 27
May 22 18:46:31 [HOST] kernel: [  256.354413] amdgpu 0000:0c:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
May 22 18:46:31 [HOST] kernel: [  256.354418] amdgpu 0000:0c:00.0: [gfxhub] no-retry page fault (src_id:0 ring:158 vmid:5 pasid:32780, for process Battle.net.exe pid 10384 thread Battle.net:cs0 pid 10575)
May 22 18:46:31 [HOST] kernel: [  256.354419] amdgpu 0000:0c:00.0:   in page starting at address 0x0000000000400000 from 27
May 22 18:46:31 [HOST] kernel: [  256.354420] amdgpu 0000:0c:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
May 22 18:46:31 [HOST] kernel: [  256.354424] amdgpu 0000:0c:00.0: [gfxhub] no-retry page fault (src_id:0 ring:158 vmid:5 pasid:32780, for process Battle.net.exe pid 10384 thread Battle.net:cs0 pid 10575)
May 22 18:46:31 [HOST] kernel: [  256.354426] amdgpu 0000:0c:00.0:   in page starting at address 0x0000000000400000 from 27
May 22 18:46:31 [HOST] kernel: [  256.354427] amdgpu 0000:0c:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
May 22 18:46:31 [HOST] kernel: [  256.354430] amdgpu 0000:0c:00.0: [gfxhub] no-retry page fault (src_id:0 ring:158 vmid:5 pasid:32780, for process Battle.net.exe pid 10384 thread Battle.net:cs0 pid 10575)
May 22 18:46:31 [HOST] kernel: [  256.354432] amdgpu 0000:0c:00.0:   in page starting at address 0x0000000000400000 from 27
May 22 18:46:31 [HOST] kernel: [  256.354433] amdgpu 0000:0c:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
May 22 18:46:31 [HOST] kernel: [  256.354437] amdgpu 0000:0c:00.0: [gfxhub] no-retry page fault (src_id:0 ring:158 vmid:5 pasid:32780, for process Battle.net.exe pid 10384 thread Battle.net:cs0 pid 10575)
May 22 18:46:31 [HOST] kernel: [  256.354438] amdgpu 0000:0c:00.0:   in page starting at address 0x0000000000400000 from 27
May 22 18:46:31 [HOST] kernel: [  256.354439] amdgpu 0000:0c:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
May 22 18:46:31 [HOST] kernel: [  256.354443] amdgpu 0000:0c:00.0: [gfxhub] no-retry page fault (src_id:0 ring:158 vmid:5 pasid:32780, for process Battle.net.exe pid 10384 thread Battle.net:cs0 pid 10575)
May 22 18:46:31 [HOST] kernel: [  256.354444] amdgpu 0000:0c:00.0:   in page starting at address 0x0000000000400000 from 27
May 22 18:46:31 [HOST] kernel: [  256.354445] amdgpu 0000:0c:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
May 22 18:46:31 [HOST] kernel: [  256.354449] amdgpu 0000:0c:00.0: [gfxhub] no-retry page fault (src_id:0 ring:158 vmid:5 pasid:32780, for process Battle.net.exe pid 10384 thread Battle.net:cs0 pid 10575)
May 22 18:46:31 [HOST] kernel: [  256.354450] amdgpu 0000:0c:00.0:   in page starting at address 0x0000000000400000 from 27
May 22 18:46:31 [HOST] kernel: [  256.354451] amdgpu 0000:0c:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
May 22 18:46:41 [HOST] kernel: [  261.469953] [drm:amdgpu_dm_commit_planes.isra.43 [amdgpu]] *ERROR* Waiting for fences timed out.
May 22 18:46:41 [HOST] kernel: [  266.593840] [drm:amdgpu_dm_commit_planes.isra.43 [amdgpu]] *ERROR* Waiting for fences timed out.
May 22 18:46:41 [HOST] kernel: [  266.599848] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=18098, emitted seq=18100
May 22 18:46:41 [HOST] kernel: [  266.599914] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Battle.net.exe pid 10384 thread Battle.net:cs0 pid 10575
May 22 18:46:41 [HOST] kernel: [  266.599918] amdgpu 0000:0c:00.0: GPU reset begin!
May 22 18:46:47 [HOST] kernel: [  271.709694] [drm:amdgpu_dm_commit_planes.isra.43 [amdgpu]] *ERROR* Waiting for fences timed out.
May 22 18:46:47 [HOST] kernel: [  272.165625] amdgpu 0000:0c:00.0: GPU BACO reset
May 22 18:46:47 [HOST] kernel: [  272.643907] amdgpu 0000:0c:00.0: GPU reset succeeded, trying to resume
May 22 18:46:47 [HOST] kernel: [  272.644035] [drm] PCIE GART of 512M enabled (table at 0x000000F400900000).
May 22 18:46:47 [HOST] kernel: [  272.644126] [drm:amdgpu_device_gpu_recover [amdgpu]] *ERROR* VRAM is lost!
May 22 18:46:47 [HOST] kernel: [  272.644277] [drm] PSP is resuming...
May 22 18:46:47 [HOST] kernel: [  272.790964] [drm] reserve 0x400000 from 0xf400d00000 for PSP TMR SIZE
May 22 18:46:47 [HOST] kernel: [  272.801714] amdgpu: [powerplay] Failed to send message: 0x46, ret value: 0xffffffff
May 22 18:46:47 [HOST] kernel: [  272.801830] amdgpu: [powerplay] Failed to send message: 0x61, ret value: 0xffffffff
May 22 18:46:48 [HOST] kernel: [  273.172332] [drm] UVD and UVD ENC initialized successfully.
May 22 18:46:48 [HOST] kernel: [  273.271995] [drm] VCE initialized successfully.
May 22 18:46:48 [HOST] kernel: [  273.273190] [drm] recover vram bo from shadow start
May 22 18:46:48 [HOST] kernel: [  273.279784] [drm] recover vram bo from shadow done
May 22 18:46:48 [HOST] kernel: [  273.279787] [drm] Skip scheduling IBs!
May 22 18:46:48 [HOST] kernel: [  273.279789] [drm] Skip scheduling IBs!
May 22 18:46:48 [HOST] kernel: [  273.279823] [drm] Skip scheduling IBs!
May 22 18:46:48 [HOST] kernel: [  273.279831] [drm] Skip scheduling IBs!
May 22 18:46:48 [HOST] kernel: [  273.279833] [drm] Skip scheduling IBs!
May 22 18:46:48 [HOST] kernel: [  273.279838] [drm] Skip scheduling IBs!
May 22 18:46:48 [HOST] kernel: [  273.279844] amdgpu 0000:0c:00.0: GPU reset(2) succeeded!
May 22 18:46:48 [HOST] kernel: [  273.279844] [drm] Skip scheduling IBs!
May 22 18:46:48 [HOST] kernel: [  273.279848] [drm] Skip scheduling IBs!
May 22 18:46:48 [HOST] kernel: [  273.279853] [drm] Skip scheduling IBs!
May 22 18:46:48 [HOST] kernel: [  273.279855] [drm] Skip scheduling IBs!
...