Почему безопасная точка Java никогда не достигалась? Зависание потока, тайм-аут безопасной точки - PullRequest
0 голосов
/ 20 января 2019

Поведение такое же, как у этого вопроса , JVM застряла на неопределенный срок, пытаясь добраться до безопасной точки, но после jstack -F она продолжает выполнение.Наша версия сервера Linux - CentOS 7.5.1804 с ядром 3.10.0-123.9.3.el7.x86_64, похоже, не в уязвимых версиях.

В последнее время это поведение появлялось дважды, раньше оно было нормальным… Я не могу найти причину.

Журнал GC: * ​​1008 *

2019-01-19T07:36:30.567+0800: 141192.890: Application time: 1.2990830 seconds
2019-01-19T07:36:30.569+0800: 141192.893: Total time for which application threads were stopped: 0.0022680 seconds
2019-01-19T07:36:30.570+0800: 141192.894: Application time: 0.0008920 seconds
2019-01-19T07:36:30.571+0800: 141192.895: Total time for which application threads were stopped: 0.0015040 seconds
2019-01-19T07:36:35.078+0800: 141197.402: Application time: 4.5068430 seconds
############ After jstack -F ###########
2019-01-19T13:47:33.187+0800: 163455.511: Total time for which application threads were stopped: 22258.1091890 seconds
2019-01-19T13:47:33.188+0800: 163455.512: Application time: 0.0005320 seconds
2019-01-19T13:47:33.189+0800: 163455.513: Total time for which application threads were stopped: 0.0014990 seconds
2019-01-19T13:47:33.189+0800: 163455.513: Application time: 0.0000790 seconds
2019-01-19T13:47:33.191+0800: 163455.515: [Full GC2019-01-19T13:47:33.191+0800: 163455.515: [CMS: 446763K->442340K(7707072K), 1.0334000 secs] 587810K->442340K(8320512K), [CMS Perm : 88142K->88142K(131072K)], 1.0341410 secs] [Times: user=1.14 sys=0.00, real=1.04 secs]
2019-01-19T13:47:34.225+0800: 163456.549: Total time for which application threads were stopped: 1.0358560 seconds

Журнал статистики безопасных точек:

# SafepointSynchronize::begin: Timeout detected:
# SafepointSynchronize::begin: Timed out while waiting for threads to stop.
# SafepointSynchronize::begin: Threads which did not reach the safepoint:
# "NettyClientWorkerThread_2" daemon prio=10 tid=0x00007fec701fb000 nid=0x1d0c runnable [0x00007feb48a52000]
   java.lang.Thread.State: RUNNABLE

# "NioSocketAcceptor-34" prio=10 tid=0x00007fec50014800 nid=0x1bac runnable [0x00007feb5c58c000]
   java.lang.Thread.State: RUNNABLE

# "TP-Processor4" daemon prio=10 tid=0x00007fec987ac000 nid=0x1aba runnable [0x00007feb6a3ce000]
   java.lang.Thread.State: RUNNABLE

# "http-8050-Acceptor-0" daemon prio=10 tid=0x00007fec9867b000 nid=0x1ab1 runnable [0x00007fec4416a000]
   java.lang.Thread.State: RUNNABLE

# "MasterListener-cmpp3-[192.168.11.11:26383]" daemon prio=10 tid=0x00007fec99b06000 nid=0x1a83 runnable [0x00007feb6d696000]
   java.lang.Thread.State: RUNNABLE
...... (lots of jredis thread and memcached thread.)


 ############ After jstack -F ###########
         vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
141189.062: RevokeBias                       [    1421          0              0    ]      [     0     0     0     1     0    ]  0
141189.609: RevokeBias                       [    1421          0              0    ]      [     0     0     0     1     0    ]  0
141191.594: RevokeBias                       [    1421          0              0    ]      [     0     0     0     1     0    ]  0
141192.891: RevokeBias                       [    1421          0              0    ]      [     0     0     0     1     0    ]  0
141192.891: RevokeBias                       [    1421          0              0    ]      [     0     0     0     1     0    ]  0
141197.406: RevokeBias                       [    1421          0              1    ]      [     0 22258106 22258107     1     0    ]  0
163455.516: RevokeBias                       [    1421          0              0    ]      [     0     0     0     1     0    ]  0
163455.516: GenCollectFull                   [    1421          0              2    ]      [     0     0     0     0  1034    ]  0
163456.547: RevokeBias                       [    1421          0              0    ]      [     0     0     0     1     0    ]  0
163456.547: RevokeBias                       [    1421          0              1    ]      [     0     0     0     1     0    ]  0

Я обнаружил, что весь поток тайм-аута безопасной точки - IN_NATIVE (jstack -F).Например:

Thread 6787: (state = IN_NATIVE)
 - java.net.SocketInputStream.socketRead0(java.io.FileDescriptor, byte[], int, int, int) @bci=0 (Compiled frame; information may be imprecise)
 - java.net.SocketInputStream.read(byte[], int, int, int) @bci=87, line=152 (Compiled frame)
 - java.net.SocketInputStream.read(byte[], int, int) @bci=11, line=122 (Compiled frame)
 - java.net.SocketInputStream.read(byte[]) @bci=5, line=108 (Compiled frame)
 - redis.clients.util.RedisInputStream.ensureFill() @bci=20, line=196 (Compiled frame)
 - redis.clients.util.RedisInputStream.readByte() @bci=1, line=40 (Compiled frame)
 - redis.clients.jedis.Protocol.process(redis.clients.util.RedisInputStream) @bci=1, line=151 (Compiled frame)
 - redis.clients.jedis.Protocol.read(redis.clients.util.RedisInputStream) @bci=1, line=215 (Interpreted frame)
 - redis.clients.jedis.Connection.readProtocolWithCheckingBroken() @bci=4, line=340 (Interpreted frame)
 - redis.clients.jedis.Connection.getRawObjectMultiBulkReply() @bci=1, line=285 (Interpreted frame)
 - redis.clients.jedis.JedisPubSub.process(redis.clients.jedis.Client) @bci=1, line=121 (Interpreted frame)
 - redis.clients.jedis.JedisPubSub.proceed(redis.clients.jedis.Client, java.lang.String[]) @bci=16, line=115 (Interpreted frame)
 - redis.clients.jedis.Jedis.subscribe(redis.clients.jedis.JedisPubSub, java.lang.String[]) @bci=13, line=2680 (Interpreted frame)
 - redis.clients.jedis.JedisSentinelPool$MasterListener.run() @bci=78, line=291 (Interpreted frame)

Но, насколько я знаю, поток виртуальных машин не ожидает потока, который находится в native для блокировки.

1 Ответ

0 голосов
/ 21 января 2019

Я скачал файл исходного кода ядра (kernel-3.10.0-123.9.3.el7.src.rpm) из: http://vault.centos.org/7.0.1406/updates/Source/SPackages/ и обнаружил, что там есть ошибка futex_wait.

static void get_futex_key_refs(union futex_key *key)
{
    if (!key->both.ptr)
        return;

    switch (key->both.offset & (FUT_OFF_INODE|FUT_OFF_MMSHARED)) {
    case FUT_OFF_INODE:
        ihold(key->shared.inode); /* implies MB (B) */
        break;
    case FUT_OFF_MMSHARED:
        futex_get_mm(key); /* implies MB (B) */
        break;
    }
}

https://access.redhat.com/solutions/1386323

Фикс для исправления: https://github.com/torvalds/linux/commit/76835b0ebf8a7fe85beb03c75121419a7dec52f0#diff-178aaa4a4c77d4ef11e1c5e43d004717

...