Непонятное поведение Elasticsearch - PullRequest
0 голосов
/ 06 августа 2020

Я запускаю кластер elasticsearch с двумя узлами, который я отслеживаю, и замечаю некоторые интересные графики, которые я не могу понять. Первый - это график операций слияния во времени:

enter image description here

As far as I can see it's more like a linear growth so I assume there's something which can explain this but I can't find any information. And the more interesting thing is that at 00:00 it drops to zero. Can someone explain what is causing this?

The second graph is pretty much the same as the first one but it's a graph of the heap used by the cluster:

enter image description here

This looks like a memory leak to me and again around 00:00 the heap usage resets.

Here's a graph of the elasticsearch operations(indexing rate and search rate). As we can see from them there's almost no indexing and the peak of the search requests is 40 per second which I think is not that much load.

enter image description here enter image description here

The issue I'm facing is that the peak time of all graphs coincides with the 'rush hour' of my application and the system becomes irresponsive.

Here's some information about the setup of the cluster:

I have 2 virtual machines and the nodes are running in a separate docker containers on each of them.

Node 1 hardware(The GREEN graph):

  • 8 core Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
  • a spinning hard drive
  • 16gb of heap allocated

Node 2 hardware(The YELLOW graph):

  • 8 core Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
  • a spinning hard drive
  • 16gb of heap allocated

On these virtual machines runs also the db cluster, so the processor is shared between the db cluster and the elasticseach cluster.

Another thing that is worth mentioning is that because of the spinning disks I tried this recommendation https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-merge.html, но он ничего не изменил. Я применил настройку на уровне индекса без перезапуска кластера, поскольку в нескольких местах я обнаружил информацию о том, что это настройка времени выполнения.

...