Question

Я развернул приложение Spring Boot, которое имеет очередь на основе базы данных с заданиями на App Service.

Вчера я выполнил несколько операций Scale out и Scale in, пока приложение работало, чтобы посмотреть, как он будет вести себя.

В какой-то момент (не обязательно в связи с операциями масштабирования) приложение начало выдавать Hikari ошибки.

com.zaxxer.hikari.pool.PoolBase          : HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection@1ae66f34 (This connection has been closed.). Possibly consider using a shorter maxLifetime value.
com.zaxxer.hikari.pool.ProxyConnection   : HikariPool-1 - Connection org.postgresql.jdbc.PgConnection@1ef85079 marked as broken because of SQLSTATE(08006), ErrorCode(0)

Ниже приведены трассировки стека из моего запланированного весеннего задания и другая информация:

org.postgresql.util.PSQLException: An I/O error occurred while sending to the backend.
Caused by: javax.net.ssl.SSLException: Connection reset by peer (Write failed)
Suppressed: java.net.SocketException: Broken pipe (Write failed)
Caused by: java.net.SocketException: Connection reset by peer (Write failed)

Далее следующий стек ошибок:

WARN 1 --- [   scheduling-1] com.zaxxer.hikari.pool.PoolBase          : HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection@48d0d6da (This connection has been closed.). 

Possibly consider using a shorter maxLifetime value.
org.springframework.jdbc.support.MetaDataAccessException: Error while extracting DatabaseMetaData; nested exception is java.sql.SQLException: Connection is closed
Caused by: java.sql.SQLException: Connection is closed

Код, который вызывается периодически - каждые 500 миллисекунд здесь:

@Scheduled(fixedDelayString = "${worker.delay}")
@Transactional
public void execute() {
    jobManager.next(jobClass).ifPresent(this::handleJob);
}

Обновить. Приведенный выше код почти все время ничего не делает, так как на сайте не было трафика c.

Update2. Я проверил Postgres журналов и обнаружил следующее:

2020-07-11 22:48:09 UTC-5f0866f0.f0-LOG:  checkpoint starting: immediate force wait
2020-07-11 22:48:10 UTC-5f0866f0.f0-LOG:  checkpoint complete (240): wrote 30 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.046 s, sync=0.046 s, total=0.437 s; sync files=13, longest=0.009 s, average=0.003 s; distance=163 kB, estimate=13180 kB
2020-07-11 22:48:10 UTC-5f0866ee.68-LOG:  received immediate shutdown request
2020-07-11 22:48:10 UTC-5f0a3f41.8914-WARNING:  terminating connection because of crash of another server process
2020-07-11 22:48:10 UTC-5f0a3f41.8914-DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
// Same text about 10 times
2020-07-11 22:48:10 UTC-5f0866f2.7c-HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2020-07-11 22:48:10 UTC-5f0866ee.68-LOG:  src/port/kill.c(84): Process (272) exited OOB of pgkill.
2020-07-11 22:48:10 UTC-5f0866f1.fc-WARNING:  terminating connection because of crash of another server process
2020-07-11 22:48:10 UTC-5f0866f1.fc-DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2020-07-11 22:48:10 UTC-5f0866f1.fc-HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2020-07-11 22:48:10 UTC-5f0866ee.68-LOG:  archiver process (PID 256) exited with exit code 1
2020-07-11 22:48:11 UTC-5f0866ee.68-LOG:  database system is shut down

Похоже, это проблема с сервером Azure PostgresSQL, и он закрылся. Я правильно читаю?

SKumar · Answer 1 · 12 июля 2020

Как указано в ваших журналах, пробовали ли вы установить свойство maxLifetime для Hikari CP? Я думаю, что после установки этого свойства эта проблема должна быть решена.

На основе Hikari do c (https://github.com/brettwooldridge/HikariCP) - maxLifetime This property controls the maximum lifetime of a connection in the pool. An in-use connection will never be retired, only when it is closed will it then be removed. On a connection-by-connection basis, minor negative attenuation is applied to avoid mass-extinction in the pool. We strongly recommend setting this value, and it should be several seconds shorter than any database or infrastructure imposed connection time limit. A value of 0 indicates no maximum lifetime (infinite lifetime), subject of course to the idleTimeout setting. The minimum allowed value is 30000ms (30 seconds). Default: 1800000 (30 minutes)

Azure Служба приложений - Spring Boot - Ошибки Hikari

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

1 Ответ

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Azure Служба приложений - Spring Boot - Ошибки Hikari

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

1 Ответ

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Похожие темы