Правильный способ увеличения vm.max_map_count - PullRequest
0 голосов
/ 04 августа 2020

Я изо всех сил пытаюсь увеличить vm.max_map_count (для запуска elasticsearch в моем кластере ECS), используя параметры пользовательских данных моего экземпляра EC2. Я выяснил, что файл /etc/sysctl.con принадлежит root, и поэтому я попытался запустить сценарий как root в моем сценарии пользовательских данных.

Я использую terraform для моего развертывания и Экземпляр EC2, на котором запущен кластер ECS, настроен следующим образом:

#!/bin/bash
sudo -s
echo 'vm.max_map_count=524288' >> /etc/sysctl.conf
sysctl -p /etc/sysctl.conf

# ECS config
{
  echo "ECS_CLUSTER=${cluster_name}"
} >> /etc/ecs/ecs.config

start ecs

echo "Done"

Однако elasticsearch не может запуститься и выдает ошибку

STOPPED (OutOfMemoryError: Container killed due to memory u)

Cloudwatch также ничего не регистрирует.

У моего экземпляра EC2 нет publi c ip, и я не могу ввести s sh в свой экземпляр, чтобы проверить, обновил ли скрипт файл conf

Итак, я хотел бы знать две вещи:

  1. Правильный ли приведенный выше сценарий для установки пользовательских данных моего экземпляра EC2?
  2. Как я могу устранить проблему в моем сценарии (нет подключения к EC2, нет журналов облачных часов )

EDIT2: здесь, возможно, полезный вывод для устранения неполадок и рекомендаций

Amazon Linux 2
Kernel 4.14.186-146.268.amzn2.x86_64 on an x86_64

ip-10-1-1-237 login: [   44.762892] cloud-init[3432]: One of the configured repositories failed (Unknown),
[   44.764564] cloud-init[3432]: and yum doesn't have enough cached data to continue. At this point the only
[   44.765930] cloud-init[3432]: safe thing yum can do is fail. There are a few ways to work "fix" this:
[   44.766119] cloud-init[3432]: 1. Contact the upstream for the repository and get them to fix the problem.
[   44.766330] cloud-init[3432]: 2. Reconfigure the baseurl/etc. for the repository, to point to a working
[   44.766544] cloud-init[3432]: upstream. This is most often useful if you are using a newer
[   44.766775] cloud-init[3432]: distribution release than is supported by the repository (and the
[   44.766994] cloud-init[3432]: packages for the previous distribution release still work).
[   44.767210] cloud-init[3432]: 3. Run the command with the repository temporarily disabled
[   44.767437] cloud-init[3432]: yum --disablerepo=<repoid> ...
[   44.767656] cloud-init[3432]: 4. Disable the repository permanently, so yum won't use it by default. Yum
[   44.767889] cloud-init[3432]: will then just ignore the repository until you permanently enable it
[   44.768126] cloud-init[3432]: again or use --enablerepo for temporary usage:
[   44.768323] cloud-init[3432]: yum-config-manager --disable <repoid>
[   44.768539] cloud-init[3432]: or
[   44.768766] cloud-init[3432]: subscription-manager repos --disable=<repoid>
[   44.768999] cloud-init[3432]: 5. Configure the failing repository to be skipped, if it is unavailable.
[   44.769217] cloud-init[3432]: Note that yum will try to contact the repo. when it runs most commands,
[   44.769449] cloud-init[3432]: so will have to try and fail each time (and thus. yum will be be much
[   44.769677] cloud-init[3432]: slower). If it is a very temporary problem though, this is often a nice
[   44.769904] cloud-init[3432]: compromise:
[   44.770127] cloud-init[3432]: yum-config-manager --save --setopt=<repoid>.skip_if_unavailable=true
[   44.770345] cloud-init[3432]: Cannot find a valid baseurl for repo: amzn2-core/2/x86_64
[   44.770593] cloud-init[3432]: Could not retrieve mirrorlist http://amazonlinux.us-east-1.amazonaws.com/2/core/latest/x86_64/mirror.list error was
[   44.770805] cloud-init[3432]: 12: Timeout on http://amazonlinux.us-east-1.amazonaws.com/2/core/latest/x86_64/mirror.list: (28, 'Connection timed out after 5000 milliseconds')
[   44.779066] cloud-init[3432]: Aug 04 06:38:22 cloud-init[3432]: util.py[WARNING]: Package upgrade failed
[   44.782591] cloud-init[3432]: Aug 04 06:38:22 cloud-init[3432]: cc_package_update_upgrade_install.py[WARNING]: 1 failed with exceptions, re-raising the last one
[   44.782892] cloud-init[3432]: Aug 04 06:38:22 cloud-init[3432]: util.py[WARNING]: Running module package-update-upgrade-install (<module 'cloudinit.config.cc_package_update_upgrade_install' from '/usr/lib/python2.7/site-packages/cloudinit/config/cc_package_update_upgrade_install.pyc'>) failed
[   45.080534] cloud-init[4041]: Cloud-init v. 19.3-3.amzn2 running 'modules:final' at Tue, 04 Aug 2020 06:38:22 +0000. Up 45.03 seconds.
[   45.120997] cloud-init[4041]: vm.max_map_count = 524288
[   45.123170] cloud-init[4041]: /var/lib/cloud/instance/scripts/part-001: line 11: start: command not found
[   45.126606] cloud-initci-info: no authorized ssh keys fingerprints found for user ec2-user.
[4041]: Done
[   45.129719] cloud-init[4041]: ci-info: no authorized ssh keys fingerprints found for user ec2-user.
<14>Aug  4 06:38:22 ec2: 
<14>Aug  4 06:38:22 ec2: #############################################################
<14>Aug  4 06:38:22 ec2: -----BEGIN SSH HOST KEY FINGERPRINTS-----
<14>Aug  4 06:38:22 ec2: 256 SHA256:blah no comment (ECDSA)
<14>Aug  4 06:38:22 ec2: 256 SHA256:blah no comment (ED25519)
<14>Aug  4 06:38:22 ec2: 2048 SHA256:blah no comment (RSA)
<14>Aug  4 06:38:22 ec2: -----END SSH HOST KEY FINGERPRINTS-----
<14>Aug  4 06:38:22 ec2: #############################################################
-----BEGIN SSH HOST KEY KEYS-----
ecdsa-sha2-nistp256 blah
-----END SSH HOST KEY KEYS-----
[   45.168565] cloud-init[4041]: Cloud-init v. 19.3-3.amzn2 finished at Tue, 04 Aug 2020 06:38:22 +0000. Datasource DataSourceEc2.  Up 45.16 seconds
[   46.787445] cgroup: cgroup: disabling cgroup2 socket matching due to net_prio or net_cls activation
[  213.284563] docker0: port 1(veth4eeea3f) entered blocking state
[  213.287486] docker0: port 1(veth4eeea3f) entered disabled state
[  213.290275] device veth4eeea3f entered promiscuous mode
[  213.295028] IPv6: ADDRCONF(NETDEV_UP): veth4eeea3f: link is not ready
[  213.303068] docker0: port 2(veth574a47c) entered blocking state
[  213.305816] docker0: port 2(veth574a47c) entered disabled state
[  213.308706] device veth574a47c entered promiscuous mode
[  213.311335] IPv6: ADDRCONF(NETDEV_UP): veth574a47c: link is not ready
[  213.314368] docker0: port 2(veth574a47c) entered blocking state
[  213.317107] docker0: port 2(veth574a47c) entered forwarding state
[  213.320128] docker0: port 2(veth574a47c) entered disabled state
[  213.591599] eth0: renamed from vethbbd83de
[  213.615542] IPv6: ADDRCONF(NETDEV_CHANGE): veth4eeea3f: link becomes ready
[  213.618597] docker0: port 1(veth4eeea3f) entered blocking state
[  213.621235] docker0: port 1(veth4eeea3f) entered forwarding state
[  213.639479] IPv6: ADDRCONF(NETDEV_CHANGE): docker0: link becomes ready
[  213.642670] eth0: renamed from veth72cfad6
[  213.663511] IPv6: ADDRCONF(NETDEV_CHANGE): veth574a47c: link becomes ready
[  213.666718] docker0: port 2(veth574a47c) entered blocking state
[  213.670963] docker0: port 2(veth574a47c) entered forwarding state
[  214.519487] GC Thread#0 invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null),  order=0, oom_score_adj=0
[  214.528057] GC Thread#0 cpuset=f3341677e0c2154fdfa25d039e44f44504e63f6614a19ae73739ecd88c9ef631 mems_allowed=0
[  214.535557] CPU: 3 PID: 5784 Comm: GC Thread#0 Not tainted 4.14.186-146.268.amzn2.x86_64 #1
[  214.542162] Hardware name: Amazon EC2 m5.xlarge/, BIOS 1.0 10/16/2017
[  214.546533] Call Trace:
[  214.549262]  dump_stack+0x66/0x82
[  214.552505]  dump_header+0x94/0x229
[  214.555750]  oom_kill_process+0x223/0x420
[  214.559158]  out_of_memory+0x112/0x4d0
[  214.562402]  mem_cgroup_out_of_memory+0x49/0x80
[  214.565953]  mem_cgroup_oom_synchronize+0x2ed/0x330
[  214.569722]  ? mem_cgroup_css_reset+0xd0/0xd0
[  214.573266]  pagefault_out_of_memory+0x32/0x77
[  214.576840]  __do_page_fault+0x4b4/0x4c0
[  214.580166]  ? async_page_fault+0x2f/0x50
[  214.583504]  async_page_fault+0x45/0x50
[  214.586757] RIP: 40000000:          (null)
[  214.590132] RSP: 28b74848:00007fb42415ae10 EFLAGS: 7fb428b74830
[  214.590157] Task in /ecs/721a7f6e-b42c-4480-8372-f0a96dd1620b/f3341677e0c2154fdfa25d039e44f44504e63f6614a19ae73739ecd88c9ef631 killed as a result of limit of /ecs/721a7f6e-b42c-4480-8372-f0a96dd1620b/f3341677e0c2154fdfa25d039e44f44504e63f6614a19ae73739ecd88c9ef631
[  214.610391] memory: usage 524288kB, limit 524288kB, failcnt 36
[  214.614549] memory+swap: usage 524288kB, limit 1048576kB, failcnt 0
[  214.618892] kmem: usage 3080kB, limit 9007199254740988kB, failcnt 0
[  214.623197] Memory cgroup stats for /ecs/721a7f6e-b42c-4480-8372-f0a96dd1620b/f3341677e0c2154fdfa25d039e44f44504e63f6614a19ae73739ecd88c9ef631: cache:60KB rss:521148KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:36KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:521148KB inactive_file:36KB active_file:24KB unevictable:0KB
[  214.643223] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[  214.650058] [ 5572]  1000  5572   324803   133307     296       5        0             0 java
[  214.656896] Memory cgroup out of memory: Kill process 5572 (java) score 1019 or sacrifice child
[  214.663758] Killed process 5572 (java) total-vm:1299212kB, anon-rss:520708kB, file-rss:12520kB, shmem-rss:0kB
[  214.712611] oom_reaper: reaped process 5572 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[  214.849576] docker0: port 1(veth4eeea3f) entered disabled state
[  214.854045] vethbbd83de: renamed from eth0
[  214.905347] docker0: port 1(veth4eeea3f) entered disabled state
[  214.911787] device veth4eeea3f left promiscuous mode
[  214.915771] docker0: port 1(veth4eeea3f) entered disabled state
[  245.243122] docker0: port 2(veth574a47c) entered disabled state
[  245.247371] veth72cfad6: renamed from eth0
[  245.306870] docker0: port 2(veth574a47c) entered disabled state
[  245.312623] device veth574a47c left promiscuous mode
[  245.316512] docker0: port 2(veth574a47c) entered disabled state
[  286.703369] docker0: port 1(veth5479e5d) entered blocking state
[  286.707827] docker0: port 1(veth5479e5d) entered disabled state
[  286.712364] device veth5479e5d entered promiscuous mode
[  286.716682] IPv6: ADDRCONF(NETDEV_UP): veth5479e5d: link is not ready
[  286.754304] docker0: port 2(veth80e3f1f) entered blocking state
[  286.758872] docker0: port 2(veth80e3f1f) entered disabled state
[  286.763349] device veth80e3f1f entered promiscuous mode
[  286.767539] IPv6: ADDRCONF(NETDEV_UP): veth80e3f1f: link is not ready
[  286.772363] docker0: port 2(veth80e3f1f) entered blocking state
[  286.776828] docker0: port 2(veth80e3f1f) entered forwarding state
[  286.781290] docker0: port 2(veth80e3f1f) entered disabled state
[  286.987198] eth0: renamed from veth1e329df
[  287.015121] IPv6: ADDRCONF(NETDEV_CHANGE): veth80e3f1f: link becomes ready
[  287.019969] docker0: port 2(veth80e3f1f) entered blocking state
[  287.024262] docker0: port 2(veth80e3f1f) entered forwarding state
[  287.055151] eth0: renamed from veth25eba88
[  287.071059] IPv6: ADDRCONF(NETDEV_CHANGE): veth5479e5d: link becomes ready
[  287.075711] docker0: port 1(veth5479e5d) entered blocking state
[  287.080014] docker0: port 1(veth5479e5d) entered forwarding state
[  287.677695] docker0: port 3(veth5a3ad7f) entered blocking state
[  287.682701] docker0: port 3(veth5a3ad7f) entered disabled state
[  287.687853] device veth5a3ad7f entered promiscuous mode
[  287.693259] IPv6: ADDRCONF(NETDEV_UP): veth5a3ad7f: link is not ready
[  287.697902] docker0: port 3(veth5a3ad7f) entered blocking state
[  287.702447] docker0: port 3(veth5a3ad7f) entered forwarding state
[  287.706783] docker0: port 3(veth5a3ad7f) entered disabled state
[  287.970277] GC Thread#0 invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null),  order=0, oom_score_adj=0
[  287.978532] GC Thread#0 cpuset=19e40fc2ee06f0ad703989236fd185d5792eb56fb5864fcfdb973cb68963107a mems_allowed=0
[  287.986153] CPU: 1 PID: 6405 Comm: GC Thread#0 Not tainted 4.14.186-146.268.amzn2.x86_64 #1
[  287.993112] Hardware name: Amazon EC2 m5.xlarge/, BIOS 1.0 10/16/2017
[  287.997542] Call Trace:
[  288.000237]  dump_stack+0x66/0x82
[  288.003277]  dump_header+0x94/0x229
[  288.006388]  oom_kill_process+0x223/0x420
[  288.009690]  out_of_memory+0x112/0x4d0
[  288.012947]  mem_cgroup_out_of_memory+0x49/0x80
[  288.016494]  mem_cgroup_oom_synchronize+0x2ed/0x330
[  288.020237]  ? mem_cgroup_css_reset+0xd0/0xd0
[  288.023694]  pagefault_out_of_memory+0x32/0x77
[  288.027229]  __do_page_fault+0x4b4/0x4c0
[  288.030548]  ? async_page_fault+0x2f/0x50
[  288.033918]  async_page_fault+0x45/0x50
[  288.037227] RIP: 40000000:          (null)
[  288.040644] RSP: b8f29848:00007f96b450fe10 EFLAGS: 7f96b8f29830
[  288.040681] Task in /ecs/a0a3e809-7368-4858-9d34-0a368eb9874c/19e40fc2ee06f0ad703989236fd185d5792eb56fb5864fcfdb973cb68963107a killed as a result of limit of /ecs/a0a3e809-7368-4858-9d34-0a368eb9874c/19e40fc2ee06f0ad703989236fd185d5792eb56fb5864fcfdb973cb68963107a
[  288.060978] memory: usage 524288kB, limit 524288kB, failcnt 30
[  288.065088] memory+swap: usage 524288kB, limit 1048576kB, failcnt 0
[  288.069415] kmem: usage 3148kB, limit 9007199254740988kB, failcnt 0
[  288.073680] Memory cgroup stats for /ecs/a0a3e809-7368-4858-9d34-0a368eb9874c/19e40fc2ee06f0ad703989236fd185d5792eb56fb5864fcfdb973cb68963107a: cache:36KB rss:521104KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:521104KB inactive_file:36KB active_file:0KB unevictable:0KB
[  288.094168] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[  288.101220] [ 6235]  1000  6235   324803   133316     297       4        0             0 java
[  288.108312] Memory cgroup out of memory: Kill process 6235 (java) score 1019 or sacrifice child
[  288.115271] Killed process 6235 (java) total-vm:1299212kB, anon-rss:520852kB, file-rss:12412kB, shmem-rss:0kB
[  288.123037] eth0: renamed from veth23c84cd
[  288.147112] IPv6: ADDRCONF(NETDEV_CHANGE): veth5a3ad7f: link becomes ready
[  288.151949] docker0: port 3(veth5a3ad7f) entered blocking state
[  288.156308] docker0: port 3(veth5a3ad7f) entered forwarding state
[  288.265675] docker0: port 1(veth5479e5d) entered disabled state
[  288.270060] veth25eba88: renamed from eth0
[  288.326109] docker0: port 1(veth5479e5d) entered disabled state
[  288.332410] device veth5479e5d left promiscuous mode
[  288.336638] docker0: port 1(veth5479e5d) entered disabled state
[  318.638946] docker0: port 3(veth5a3ad7f) entered disabled state
[  318.643228] veth23c84cd: renamed from eth0
[  318.705867] docker0: port 3(veth5a3ad7f) entered disabled state
[  318.711621] device veth5a3ad7f left promiscuous mode
[  318.715546] docker0: port 3(veth5a3ad7f) entered disabled state
[  319.122960] docker0: port 2(veth80e3f1f) entered disabled state
[  319.127785] veth1e329df: renamed from eth0
[  319.189633] docker0: port 2(veth80e3f1f) entered disabled state
[  319.195271] device veth80e3f1f left promiscuous mode
[  319.199097] docker0: port 2(veth80e3f1f) entered disabled state
...