Кластер службы AWS Elasticsearch Service находится в красном состоянии - слишком много открытых файлов на узле - PullRequest
0 голосов
/ 20 мая 2019

У меня есть кластер AWSasticsearch со следующими настройками:

 curl -s 'https://***..es.amazonaws.com/_cluster/settings' | jq                                      SIGINT(2)|SIGINT(2)|0 ↵  10017  14:38:29
{
  "persistent": {
    "cluster": {
      "routing": {
        "allocation": {
          "cluster_concurrent_rebalance": "2",
          "node_concurrent_recoveries": "2",
          "disk": {
            "watermark": {
              "low": "15.0gb",
              "flood_stage": "5.0gb",
              "high": "10.0gb"
            }
          },
          "node_initial_primaries_recoveries": "4"
        }
      },
      "blocks": {
        "create_index": "false"
      }
    },
    "indices": {
      "recovery": {
        "max_bytes_per_sec": "40mb"
      }
    }
  },
  "transient": {
    "cluster": {
      "routing": {
        "allocation": {
          "cluster_concurrent_rebalance": "2",
          "node_concurrent_recoveries": "2",
          "disk": {
            "watermark": {
              "low": "15.0gb",
              "flood_stage": "5.0gb",
              "high": "10.0gb"
            }
          },
          "exclude": {
            "_ip": "10.212.32.62,10.212.31.186"
          },
          "node_initial_primaries_recoveries": "4",
          "awareness": {}
        }
      }
    },
    "indices": {
      "recovery": {
        "max_bytes_per_sec": "40mb"
      }
    }
  }
}

здоровье возвращается

 curl -s 'https://***..es.amazonaws.com/_cluster/health?pretty' | jq                                                       ✔  10018  14:38:50
{
  "cluster_name": "***",
  "status": "red",
  "timed_out": false,
  "number_of_nodes": 13,
  "number_of_data_nodes": 10,
  "active_primary_shards": 3116,
  "active_shards": 3562,
  "relocating_shards": 0,
  "initializing_shards": 16,
  "unassigned_shards": 9214,
  "delayed_unassigned_shards": 0,
  "number_of_pending_tasks": 83,
  "number_of_in_flight_fetch": 1974,
  "task_max_waiting_in_queue_millis": 49831498,
  "active_shards_percent_as_number": 27.84552845528455
}

копать дальше, я получаю

 curl -s 'https://***..es.amazonaws.com/_nodes/stats' | jq                                                                 ✔  10019  14:43:41
{
  "_nodes": {
    "total": 13,
    "successful": 12,
    "failed": 1,
    "failures": [
      {
        "type": "failed_node_exception",
        "reason": "Failed node [o3Fb21UVQx2rwwm2ZiVu7w]",
        "caused_by": {
          "type": "exception",
          "reason": "failed to refresh store stats",
          "caused_by": {
            "type": "file_system_exception",
            "reason": "/hdd1/mnt/env/root/apollo/env/swift-eu-west-1-prod-ES_6_3AMI-ES2-p001/var/es/data/nodes/0/indices/wvMTt2eiSfSDFVQbuGDEeQ/1/index: Too many open files"
          }
        }
      }
    ]
  },

иоткрытые файлы:

 curl -s -XGET 'https://***..es.amazonaws.com/_cat/nodes?v&h=ip,fdc,fdm'
ip              fdc    fdm
x.x.x.x  70014 128000
x.x.x.x   950 128000
x.x.x.x    915 128000
x.x.x.x    949 128000
x.x.x.x    950 128000
x.x.x.x    954 128000
x.x.x.x   9124 128000
x.x.x.x
x.x.x.x  36916 128000
x.x.x.x    951 128000
x.x.x.x    919 128000
x.x.x.x   948 128000
x.x.x.x   950 128000

Любые советы о том, как решить эту проблему, высоко ценится.

...