Кассандра-Жнец: ремонт неоднократно откладывался и застревал - PullRequest
1 голос
/ 11 марта 2020

Cassandra-Reaper v2.0.3 Cassandra v3.11.5.1

каждый день я запускаю ремонт на одном пространстве ключей, и с нескольких недель go ремонт никогда не заканчивается. Ниже приведена информационная таблица, взятая из приборной панели жнеца:


ID | 00000000-0000-0177-0000-000000000000
-- | --
Owner | g
Cause | g
Last event | postponed repair segment 00000000-0000-c696-0000-000000000000 because one of the hosts (xx.xx.xx.xx) was already involved in a repair
Start time | March 9, 2020 10:45 AM
End time |  
Pause time |  
Duration | 22 hours 17 minutes 10 seconds
Segment count | 136
Segment repaired | 67
Intensity | 0.8999999761581421
Repair parallelism | PARALLEL
Incremental repair | false
Repair threads | 1
Nodes |  
Datacenters | DC1
Blacklist |  
Creation time | March 9, 2020 10:45 AM
Available metrics(can require a full run before appearing) | io.cassandrareaper.service.RepairRunner.repairProgress. mycluster.mkphistory.00000000000000070000000000000000io.cassandrareaper.service.RepairRunner.segmentsDone. mycluster.mkphistory.00000000000000070000000000000000io.cassandrareaper.service.RepairRunner.segmentsTotal. mycluster.mkphistory.00000000000000070000000000000000io.cassandrareaper.service.RepairRunner.millisSinceLastRepair. mycluster.mkphistory.00000000000000070000000000000000

Я также заметил одно и то же сообщение в журнале жнеца, повторяющееся бесконечное число раз:

INFO   [ mycluster:00000000-0000-0177-0000-000000000000:00000000-0000-c696-0000-000000000000] i.c.s.RepairRunner - postponed repair segment 00000000-0000-c696-0000-000000000000 because one of the hosts (xx.xx.xx.xx) was already involved in a repair
INFO   [ mycluster:00000000-0000-0177-0000-000000000000:00000000-0000-c696-0000-000000000000] i.c.s.RepairRunner - postponed repair segment 00000000-0000-c696-0000-000000000000 because one of the hosts (xx.xx.xx.xx) was already involved in a repair
INFO   [ mycluster:00000000-0000-0177-0000-000000000000:00000000-0000-c696-0000-000000000000] i.c.s.RepairRunner - postponed repair segment 00000000-0000-c696-0000-000000000000 because one of the hosts (xx.xx.xx.xx) was already involved in a repair
INFO   [ mycluster:00000000-0000-0177-0000-000000000000:00000000-0000-c696-0000-000000000000] i.c.s.RepairRunner - postponed repair segment 00000000-0000-c696-0000-000000000000 because one of the hosts (xx.xx.xx.xx) was already involved in a repair
INFO   [ mycluster:00000000-0000-0177-0000-000000000000:00000000-0000-c696-0000-000000000000] i.c.s.RepairRunner - postponed repair segment 00000000-0000-c696-0000-000000000000 because one of the hosts (xx.xx.xx.xx) was already involved in a repair
INFO   [ mycluster:00000000-0000-0177-0000-000000000000:00000000-0000-c696-0000-000000000000] i.c.s.RepairRunner - postponed repair segment 00000000-0000-c696-0000-000000000000 because one of the hosts (xx.xx.xx.xx) was already involved in a repair
INFO   [ mycluster:00000000-0000-0177-0000-000000000000:00000000-0000-c696-0000-000000000000] i.c.s.RepairRunner - postponed repair segment 00000000-0000-c696-0000-000000000000 because one of the hosts (xx.xx.xx.xx) was already involved in a repair
INFO   [ mycluster:00000000-0000-0177-0000-000000000000:00000000-0000-c696-0000-000000000000] i.c.s.RepairRunner - postponed repair segment 00000000-0000-c696-0000-000000000000 because one of the hosts (xx.xx.xx.xx) was already involved in a repair
INFO   [ mycluster:00000000-0000-0177-0000-000000000000:00000000-0000-c696-0000-000000000000] i.c.s.RepairRunner - postponed repair segment 00000000-0000-c696-0000-000000000000 because one of the hosts (xx.xx.xx.xx) was already involved in a repair

Несколько недель go этот ремонт длится всего пару часов, запущенных с 4-мя потоками. Я попытался уменьшить количество нитей, используемых в ремонте, но результат не изменился, и ремонт все еще застрял.

Я также попытался повторный перезапуск (я также перезапустил жатку), но безуспешно.

Есть ли у вас какие-либо идеи об этом поведении?

...