Осколки Кассандры многократно портятся: Исключение IndexOutOfBounds - PullRequest
0 голосов
/ 18 января 2019

У нас есть коллекция узлов Cassandra 3.11 с множеством различных таблиц, некоторые из которых содержат огромное количество текста (например, новостные статьи, веб-сообщения и т. Д.). По какой-то причине эти таблицы статей регулярно портятся и отказываются от ремонта. Странно то, что не вся наша таблица кластера с 3 узлами не может быть исправлена, это только один осколок одной таблицы статей на одном узле, который поврежден. Выполнение этих общих команд восстановления, как правило, не работает, и поэтому мы вынуждены заменить таблицу узлов с другого узла, где она не повреждена.

Пример трассировки стека:

WARN  [ReadStage-2] 2017-10-05 17:05:11,853 AbstractLocalAwareExecutorService.java:167 - Uncaught exception on thread Thread[ReadStage-2,5,main]: {}
java.lang.IndexOutOfBoundsException: Index: 3, Size: 3
        at java.util.ArrayList.rangeCheck(ArrayList.java:653) ~[na:1.8.0_141]
        at java.util.ArrayList.get(ArrayList.java:429) ~[na:1.8.0_141]
        at org.apache.cassandra.db.marshal.TupleType.compareCustom(TupleType.java:114) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.db.marshal.AbstractType.compare(AbstractType.java:160) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.config.ColumnDefinition$1.compare(ColumnDefinition.java:200) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.config.ColumnDefinition$1.compare(ColumnDefinition.java:186) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.db.rows.Cell.lambda$static$0(Cell.java:52) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.utils.MergeIterator$Candidate.compareTo(MergeIterator.java:384) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.utils.MergeIterator$ManyToOne.replaceAndSink(MergeIterator.java:263) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:186) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:155) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.db.rows.Row$Merger$ColumnDataReducer.getReduced(Row.java:765) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.db.rows.Row$Merger$ColumnDataReducer.getReduced(Row.java:695) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:217) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:156) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.db.rows.Row$Merger.merge(Row.java:672) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator$MergeReducer.getReduced(UnfilteredRowIterators.java:554) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator$MergeReducer.getReduced(UnfilteredRowIterators.java:518) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:217) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:156) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:500) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:360) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:133) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:136) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:92) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:79) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:315) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:145) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:138) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:134) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:76) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:333) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:50) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_141]
        at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162) ~[apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134) [apache-cassandra-3.11.0.jar:3.11.0]
        at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) [apache-cassandra-3.11.0.jar:3.11.0]
        at java.lang.Thread.run(Thread.java:748) [na:1.8.0_141]

Я видел жалобы на Индекс: 6, Размер: 6 и другие. И хотя он указан как предупреждение из system.log, если я его исправлю, произойдет сбой с ошибкой:

ERROR [ValidationExecutor:30] 2019-01-17 21:05:12,583 Validator.java:268 - Failed creating a merkle tree for [repair #7f300fa0-1a9b-11e9-a3f3-27b394804448 on core_main_application/article_by_date_posted..., /172.31.0.77 (see log for details)

И, к сожалению, никаких "подробностей" не видно. Тем не менее, след во время ремонта дал:

[2019-01-17 21:05:12,574] Partition index with 4 entries found for sstable 9123
[2019-01-17 21:05:12,576] /172.31.2.189: Partition index with 2 entries found for sstable 8990
[2019-01-17 21:05:12,576] /172.31.2.189: Partition index with 3 entries found for sstable 8968
[2019-01-17 21:05:12,578] /172.31.2.189: Partition index with 4 entries found for sstable 8985
[2019-01-17 21:05:12,578] /172.31.2.189: Partition index with 0 entries found for sstable 8995
[2019-01-17 21:05:12,587] REPAIR_MESSAGE message received from /172.31.2.189
[2019-01-17 21:05:12,589] /172.31.2.189: Sending REPAIR_MESSAGE message to /172.31.0.77
[2019-01-17 21:05:12,591] Received merkle tree for article_by_date_posted from /172.31.2.189
[2019-01-17 21:05:12,600] Requesting merkle trees for person_by_user_by_date_modified (to [/172.31.2.189, /172.31.10.37, /172.31.0.77])
[2019-01-17 21:05:12,600] Sending REPAIR_MESSAGE message to /172.31.0.77
[2019-01-17 21:05:12,600] REPAIR_MESSAGE message received from /172.31.0.77
[2019-01-17 21:05:12,600] Parsing UPDATE system_distributed.repair_history SET status = 'FAILED', finished_at = toTimestamp(now()), exception_message=?, exception_stacktrace=? WHERE keyspace_name = 'core_main_application' AND columnfamily_name = 'article_by_date_posted' AND id = 7f300fa0-1a9b-11e9-a3f3-27b394804448

Кто-нибудь когда-нибудь сталкивался с этим вопросом?

Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...