StormCrawler вызывает остановку из-за ошибки нехватки памяти - PullRequest
0 голосов
/ 24 января 2019

Работа на ливневом гусеничном ходу 1.13 и эластичный поискНиже моя конфигурация гусеничного хода.Я сканирую веб-сайт с миллионами документов.Программа-обходчик не выдает никаких ошибок, если я выполняю сканирование по конкретному домену, применяя fast.urlfilter.json .Когда я указал на основной домен, применив "ignoreOutsideHost": false, "ignoreOutsideDomain": true , он выбрасывает меня java.lang.OutOfMemoryError: пространство кучи Java и остановка из-заОшибка нехватки памяти ... FetcherThread # 0 .Любое решение для плавного сканирования без ошибок памяти. Нажмите для конфигурации сканера и подробные журналы обновлены ниже.

Заранее спасибо и извиняюсь за огромный пост.

worker.log:

2019-01-22 08:31:51.989 c.d.s.b.FetcherBolt FetcherThread #0 [INFO] [Fetcher #3] Fetched https://arts.test.edu/login/?next=/schools/film-animation/other-school-film-and-animation-festivals-and-awards/test-film-and-animation-awards-1998 with status 200 in msec 107

2019-01-22 08:31:56.815 c.d.s.b.FetcherBolt FetcherThread #0 [INFO] [Fetcher #3] Fetched http://portfolios.test.edu/search?tags=Othello with status 200 in msec 162

2019-01-22 08:32:46.572 c.d.s.b.FetcherBolt FetcherThread #0 [INFO] [Fetcher #3] Fetched http://spiff.test.edu/richmond/testobs/jul25_2013/?C=S;O=A with status 200 in msec 3

2019-01-22 08:32:01.862 c.d.s.b.FetcherBolt FetcherThread #0 [INFO] [Fetcher #3] Fetched https://campusgroups.test.edu/slu/members/ with status 200 in msec 229

2019-01-22 08:32:06.693 c.d.s.b.FetcherBolt FetcherThread #0 [INFO] [Fetcher #3] Fetched http://arts.test.edu/news/16 with status 200 in msec 119

2019-01-22 08:32:11.601 c.d.s.b.FetcherBolt FetcherThread #0 [INFO] Crawl delay for queue: www.apply.test.edu  is set to 10000 as per robots.txt. url: https://www.apply.test.edu/news/testapply-holds-student-research-fair

2019-01-22 08:32:13.765 c.d.s.b.FetcherBolt FetcherThread #0 [INFO] [Fetcher #3] Fetched https://www.apply.test.edu/news/testapply-holds-student-research-fair with status 200 in msec 2164

2019-01-22 08:32:16.616 c.d.s.b.FetcherBolt FetcherThread #0 [INFO] [Fetcher #3] Fetched http://apps.test.edu/cos/scms/equipment/schedules.php?id=25&date=9-21-2019 with status 200 in msec 46

2019-01-22 08:32:21.780 c.d.s.b.FetcherBolt FetcherThread #0 [INFO] [Fetcher #3] Fetched http://edge.test.edu/edge/P19319/public/FILENAME.docx with status 200 in msec 156

2019-01-22 08:32:27.837 c.d.s.b.FetcherBolt FetcherThread #0 [INFO] [Fetcher #3] Fetched http://applywebdev.test.edu/news/booth-biography-selected-national-reading-project?page=6 with status 200 in msec 1231

2019-01-22 08:32:30.075 c.d.s.b.FetcherBolt FetcherThread #0 [INFO] [Fetcher #3] Fetched http://applywebdev.test.edu/news/grant-improve-problem-solving-skills-deaf-and-hard-hearing-students?page=6 with status 200 in msec 1235

2019-01-22 08:32:31.775 c.d.s.b.FetcherBolt FetcherThread #0 [INFO] [Fetcher #3] Fetched http://portfolios.test.edu/search?tags=feedback with status 200 in msec 197

2019-01-22 08:32:36.582 c.d.s.b.FetcherBolt FetcherThread #0 [INFO] Crawl delay for queue: infoguides.test.edu  is set to 10000 as per robots.txt. url: http://infoguides.test.edu/c.php?g=357360&p=4416876

2019-01-22 08:32:36.693 c.d.s.b.FetcherBolt FetcherThread #0 [INFO] [Fetcher #3] Fetched http://infoguides.test.edu/c.php?g=357360&p=4416876 with status 200 in msec 111

2019-01-22 08:32:41.602 c.d.s.b.FetcherBolt FetcherThread #0 [INFO] Crawl delay for queue: www.sic.test.edu  is set to 10000 as per robots.txt. url: https://www.sic.test.edu/news/sic-undergraduate-research-sparks-prestigious-professorship-astronomy?page=10

2019-01-22 08:32:42.455 c.d.s.b.FetcherBolt FetcherThread #0 [INFO] [Fetcher #3] Fetched https://www.sic.test.edu/news/sic-undergraduate-research-sparks-prestigious-professorship-astronomy?page=10 with status 200 in msec 853

2019-01-22 08:32:46.572 c.d.s.b.FetcherBolt FetcherThread #0 [INFO] [Fetcher #3] Fetched http://spiff.test.edu/richmond/testobs/jul25_2013/?C=S;O=A with status 200 in msec 3

2019-01-22 08:32:51.595 c.d.s.b.FetcherBolt FetcherThread #0 [INFO] Crawl delay for queue: www.apply.test.edu  is set to 10000 as per robots.txt. url: https://www.apply.test.edu/news/testapply-students-graduate-accolades

2019-01-22 08:32:53.748 c.d.s.b.FetcherBolt FetcherThread #0 [INFO] [Fetcher #3] Fetched https://www.apply.test.edu/news/testapply-students-graduate-accolades with status 200 in msec 2152

2019-01-22 08:33:01.976 c.d.s.b.FetcherBolt FetcherThread #0 [INFO] [Fetcher #3] Fetched https://inside.test.edu/?date=2023-12-1&t=list with status 200 in msec 355

2019-01-22 08:33:11.957 STDIO FetcherThread #0 [ERROR] Halting due to Out Of Memory Error...FetcherThread #0

2019-01-22 08:33:11.960 STDERR Thread-2 [INFO] java.lang.OutOfMemoryError: Java heap space
2019-01-22 08:33:11.968 STDERR Thread-2 [INFO] Dumping heap to artifacts/heapdump ...
2019-01-22 08:33:11.968 STDERR Thread-2 [INFO] Unable to create artifacts/heapdump: File exists

supervisor.log:

2019-01-22 08:31:40.341 o.a.s.d.s.BasicContainer SLOT_6700 [INFO] Created Worker ID da2944c7-cfd2-409a-856b-84f0a0014f56
2019-01-22 08:31:40.341 o.a.s.d.s.Container SLOT_6700 [INFO] Setting up 164ddb0a-fcba-41e3-9a14-386248370bcf:da2944c7-cfd2-409a-856b-84f0a0014f56
2019-01-22 08:31:40.341 o.a.s.d.s.Container SLOT_6700 [INFO] GET worker-user for da2944c7-cfd2-409a-856b-84f0a0014f56
2019-01-22 08:31:40.341 o.a.s.d.s.Container SLOT_6700 [INFO] SET worker-user da2944c7-cfd2-409a-856b-84f0a0014f56 testweb
2019-01-22 08:31:40.342 o.a.s.d.s.Container SLOT_6700 [INFO] Creating symlinks for worker-id: da2944c7-cfd2-409a-856b-84f0a0014f56 storm-id: www-staging-crawler-4-1548106042 for files(1): [resources]
2019-01-22 08:31:40.342 o.a.s.d.s.BasicContainer SLOT_6700 [INFO] Launching worker with assignment LocalAssignment(topology_id:www-staging-crawler-4-1548106042, executors:[ExecutorInfo(task_start:8, task_end:8), ExecutorInfo(task_start:2, task_end:2), ExecutorInfo(task_start:6, task_end:6), ExecutorInfo(task_start:10, task_end:10), ExecutorInfo(task_start:4, task_end:4), ExecutorInfo(task_start:7, task_end:7), ExecutorInfo(task_start:3, task_end:3), ExecutorInfo(task_start:1, task_end:1), ExecutorInfo(task_start:9, task_end:9), ExecutorInfo(task_start:5, task_end:5)], resources:WorkerResources(mem_on_heap:0.0, mem_off_heap:0.0, cpu:0.0), owner:testweb) for this supervisor 164ddb0a-fcba-41e3-9a14-386248370bcf on port 6700 with id da2944c7-cfd2-409a-856b-84f0a0014f56
2019-01-22 08:31:40.342 o.a.s.d.s.BasicContainer SLOT_6700 [INFO] Launching worker with command: 'java' '-cp' '/home/testweb/apps/crawler/apache-storm-1.2.2/lib/*:/home/testweb/apps/crawler/apache-storm-1.2.2/extlib/*:/home/testweb/crawler/apache-storm-1.2.2/conf:/home/testweb/apps/crawler/apache-storm-1.2.2/storm-local/supervisor/stormdist/www-staging-crawler-4-1548106042/stormjar.jar' '-Xmx64m' '-Dlogging.sensitivity=S3' '-Dlogfile.name=worker.log' '-Dstorm.home=/home/testweb/apps/crawler/apache-storm-1.2.2' '-Dworkers.artifacts=/home/testweb/var/logs/workers-artifacts' '-Dstorm.id=www-staging-crawler-4-1548106042' '-Dworker.id=da2944c7-cfd2-409a-856b-84f0a0014f56' '-Dworker.port=6700' '-Dstorm.log.dir=/home/testweb/var/logs' '-Dlog4j.configurationFile=/home/testweb/apps/crawler/apache-storm-1.2.2/log4j2/worker.xml' '-DLog4jContextSelector=org.apache.logging.log4j.core.selector.BasicContextSelector' '-Dstorm.local.dir=storm-local' 'org.apache.storm.LogWtester' 'java' '-server' '-Dlogging.sensitivity=S3' '-Dlogfile.name=worker.log' '-Dstorm.home=/home/testweb/apps/crawler/apache-storm-1.2.2' '-Dworkers.artifacts=/home/testweb/var/logs/workers-artifacts' '-Dstorm.id=www-staging-crawler-4-1548106042' '-Dworker.id=da2944c7-cfd2-409a-856b-84f0a0014f56' '-Dworker.port=6700' '-Dstorm.log.dir=/home/testweb/var/logs' '-Dlog4j.configurationFile=/home/testweb/apps/crawler/apache-storm-1.2.2/log4j2/worker.xml' '-DLog4jContextSelector=org.apache.logging.log4j.core.selector.BasicContextSelector' '-Dstorm.local.dir=storm-local' '-Xmx2048m' '-XX:+PrintGCDetails' '-Xloggc:artifacts/gc.log' '-XX:+PrintGCDateStamps' '-XX:+PrintGCTimeStamps' '-XX:+UseGCLogFileRotation' '-XX:NumberOfGCLogFiles=10' '-XX:GCLogFileSize=1M' '-XX:+HeapDumpOnOutOfMemoryError' '-XX:HeapDumpPath=artifacts/heapdump' '-Djava.library.path=/home/testweb/apps/crawler/apache-storm-1.2.2/storm-local/supervisor/stormdist/www-staging-crawler-4-1548106042/resources/Linux-amd64:/home/testweb/apps/crawler/apache-storm-1.2.2/storm-local/supervisor/stormdist/www-staging-crawler-4-1548106042/resources:/usr/local/lib:/opt/local/lib:/usr/lib' '-Dstorm.conf.file=' '-Dstorm.options=' '-Djava.io.tmpdir=/home/testweb/apps/crawler/apache-storm-1.2.2/storm-local/workers/da2944c7-cfd2-409a-856b-84f0a0014f56/tmp' '-cp' '/home/testweb/apps/crawler/apache-storm-1.2.2/lib/*:/home/testweb/apps/crawler/apache-storm-1.2.2/extlib/*:/home/testweb/crawler/apache-storm-1.2.2/conf:/home/testweb/apps/crawler/apache-storm-1.2.2/storm-local/supervisor/stormdist/www-staging-crawler-4-1548106042/stormjar.jar' 'org.apache.storm.daemon.worker' 'www-staging-crawler-4-1548106042' '164ddb0a-fcba-41e3-9a14-386248370bcf' '6700' 'da2944c7-cfd2-409a-856b-84f0a0014f56'. 
2019-01-22 08:31:40.344 o.a.s.d.s.Slot SLOT_6700 [INFO] STATE KILL_AND_RELAUNCH msInState: 18 topo:www-staging-crawler-4-1548106042 worker:da2944c7-cfd2-409a-856b-84f0a0014f56 -> WAITING_FOR_WORKER_START msInState: 0 topo:www-staging-crawler-4-1548106042 worker:da2944c7-cfd2-409a-856b-84f0a0014f56
2019-01-22 08:31:45.350 o.a.s.d.s.Slot SLOT_6700 [INFO] STATE WAITING_FOR_WORKER_START msInState: 5006 topo:www-staging-crawler-4-1548106042 worker:da2944c7-cfd2-409a-856b-84f0a0014f56 -> RUNNING msInState: 0 topo:www-staging-crawler-4-1548106042 worker:da2944c7-cfd2-409a-856b-84f0a0014f56
2019-01-22 08:33:12.328 o.a.s.d.s.BasicContainer Thread-2505 [INFO] Worker Process da2944c7-cfd2-409a-856b-84f0a0014f56 exited with code: 255
2019-01-22 08:33:12.370 o.a.s.d.s.Slot SLOT_6700 [WARN] SLOT 6700: main process has exited
2019-01-22 08:33:12.370 o.a.s.d.s.Container SLOT_6700 [INFO] Killing 164ddb0a-fcba-41e3-9a14-386248370bcf:da2944c7-cfd2-409a-856b-84f0a0014f56
2019-01-22 08:33:12.380 o.a.s.u.Utils SLOT_6700 [INFO] Error when trying to kill 1554. Process is probably already dead.
2019-01-22 08:33:15.380 o.a.s.d.s.Slot SLOT_6700 [INFO] STATE RUNNING msInState: 90030 topo:www-staging-crawler-4-1548106042 worker:da2944c7-cfd2-409a-856b-84f0a0014f56 -> KILL_AND_RELAUNCH msInState: 0 topo:www-staging-crawler-4-1548106042 worker:da2944c7-cfd2-409a-856b-84f0a0014f56
2019-01-22 08:33:15.381 o.a.s.d.s.Container SLOT_6700 [INFO] GET worker-user for da2944c7-cfd2-409a-856b-84f0a0014f56
2019-01-22 08:33:15.394 o.a.s.d.s.Container SLOT_6700 [INFO] Cleaning up 164ddb0a-fcba-41e3-9a14-386248370bcf:da2944c7-cfd2-409a-856b-84f0a0014f56
2019-01-22 08:33:15.395 o.a.s.d.s.Container SLOT_6700 [INFO] GET worker-user for da2944c7-cfd2-409a-856b-84f0a0014f56
2019-01-22 08:33:15.395 o.a.s.d.s.AdvancedFSOps SLOT_6700 [INFO] Deleting path /home/testweb/apps/crawler/apache-storm-1.2.2/storm-local/workers/da2944c7-cfd2-409a-856b-84f0a0014f56/pids/1554
2019-01-22 08:33:15.395 o.a.s.d.s.AdvancedFSOps SLOT_6700 [INFO] Deleting path /home/testweb/apps/crawler/apache-storm-1.2.2/storm-local/workers/da2944c7-cfd2-409a-856b-84f0a0014f56/heartbeats
2019-01-22 08:33:15.399 o.a.s.d.s.AdvancedFSOps SLOT_6700 [INFO] Deleting path /home/testweb/apps/crawler/apache-storm-1.2.2/storm-local/workers/da2944c7-cfd2-409a-856b-84f0a0014f56/pids
2019-01-22 08:33:15.399 o.a.s.d.s.AdvancedFSOps SLOT_6700 [INFO] Deleting path /home/testweb/apps/crawler/apache-storm-1.2.2/storm-local/workers/da2944c7-cfd2-409a-856b-84f0a0014f56/tmp
2019-01-22 08:33:15.400 o.a.s.d.s.AdvancedFSOps SLOT_6700 [INFO] Deleting path /home/testweb/apps/crawler/apache-storm-1.2.2/storm-local/workers/da2944c7-cfd2-409a-856b-84f0a0014f56
2019-01-22 08:33:15.400 o.a.s.d.s.Container SLOT_6700 [INFO] REMOVE worker-user da2944c7-cfd2-409a-856b-84f0a0014f56
2019-01-22 08:33:15.400 o.a.s.d.s.AdvancedFSOps SLOT_6700 [INFO] Deleting path /home/testweb/apps/crawler/apache-storm-1.2.2/storm-local/workers-users/da2944c7-cfd2-409a-856b-84f0a0014f56
2019-01-22 08:33:15.400 o.a.s.d.s.BasicContainer SLOT_6700 [INFO] Removed Worker ID da2944c7-cfd2-409a-856b-84f0a0014f56

gc.log.0.current:

  Java HotSpot(TM) 64-Bit Server VM (25.191-b26) for linux-amd64 JRE (1.8.0_191-b26), built on Oct  8 2018 13:54:08 by "java_re" with gcc 7.3.0
Memory: 4k page, physical 8168328k(1737328k free), swap 8387580k(8386288k free)
CommandLine flags: -XX:GCLogFileSize=1048576 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=artifacts/heapdump -XX:InitialHeapSize=130693248 -XX:MaxHeapSize=2147483648 -XX:NumberOfGCLogFiles=10 -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseGCLogFileRotation -XX:+UseParallelGC 
2019-01-22T08:31:41.541-0500: 1.028: [GC (Allocation Failure) [PSYoungGen: 32768K->5096K(37888K)] 32768K->6882K(123904K), 0.0098372 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 
2019-01-22T08:31:42.155-0500: 1.642: [GC (Allocation Failure) [PSYoungGen: 37864K->5110K(37888K)] 39650K->10524K(123904K), 0.0104951 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 
2019-01-22T08:31:42.557-0500: 2.044: [GC (Metadata GC Threshold) [PSYoungGen: 24280K->5094K(37888K)] 29694K->12912K(123904K), 0.0129743 secs] [Times: user=0.03 sys=0.00, real=0.01 secs] 
2019-01-22T08:31:42.570-0500: 2.057: [Full GC (Metadata GC Threshold) [PSYoungGen: 5094K->0K(37888K)] [ParOldGen: 7817K->7345K(64000K)] 12912K->7345K(101888K), [Metaspace: 21023K->21023K(1067008K)], 0.0578299 secs] [Times: user=0.13 sys=0.01, real=0.06 secs] 
2019-01-22T08:31:42.858-0500: 2.344: [GC (Allocation Failure) [PSYoungGen: 32768K->2425K(48128K)] 40113K->9771K(112128K), 0.0039971 secs] [Times: user=0.00 sys=0.01, real=0.01 secs] 
2019-01-22T08:31:43.563-0500: 3.050: [GC (Allocation Failure) [PSYoungGen: 47993K->5099K(68096K)] 55339K->15796K(132096K), 0.0183739 secs] [Times: user=0.06 sys=0.00, real=0.02 secs] 
2019-01-22T08:31:44.248-0500: 3.735: [GC (Metadata GC Threshold) [PSYoungGen: 45605K->9669K(74752K)] 56303K->20375K(138752K), 0.0171562 secs] [Times: user=0.05 sys=0.00, real=0.02 secs] 
2019-01-22T08:31:44.266-0500: 3.752: [Full GC (Metadata GC Threshold) [PSYoungGen: 9669K->0K(74752K)] [ParOldGen: 10705K->14480K(108032K)] 20375K->14480K(182784K), [Metaspace: 34870K->34870K(1079296K)], 0.1069368 secs] [Times: user=0.36 sys=0.01, real=0.11 secs] 
2019-01-22T08:31:45.775-0500: 5.261: [GC (GCLocker Initiated GC) [PSYoungGen: 63488K->8826K(75776K)] 77975K->23321K(183808K), 0.0103824 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 
2019-01-22T08:31:46.619-0500: 6.106: [GC (Allocation Failure) [PSYoungGen: 72314K->12264K(90624K)] 86844K->30380K(198656K), 0.0228691 secs] [Times: user=0.03 sys=0.00, real=0.03 secs] 
2019-01-22T08:31:47.414-0500: 6.901: [GC (Allocation Failure) [PSYoungGen: 90600K->15337K(93696K)] 108716K->33992K(201728K), 0.0215458 secs] [Times: user=0.05 sys=0.01, real=0.02 secs] 
2019-01-22T08:31:47.499-0500: 6.986: [GC (Allocation Failure) [PSYoungGen: 93636K->14043K(110080K)] 112291K->32707K(218112K), 0.0191082 secs] [Times: user=0.03 sys=0.01, real=0.02 secs] 
2019-01-22T08:31:47.565-0500: 7.052: [GC (Allocation Failure) [PSYoungGen: 106715K->13585K(111104K)] 125379K->32256K(219136K), 0.0110566 secs] [Times: user=0.03 sys=0.00, real=0.01 secs] 
2019-01-22T08:31:47.975-0500: 7.461: [GC (Allocation Failure) [PSYoungGen: 106257K->9626K(148480K)] 124928K->37589K(256512K), 0.0329521 secs] [Times: user=0.07 sys=0.02, real=0.03 secs] 
2019-01-22T08:31:48.847-0500: 8.334: [GC (Metadata GC Threshold) [PSYoungGen: 120769K->5799K(149504K)] 148732K->123739K(344576K), 0.0346237 secs] [Times: user=0.07 sys=0.02, real=0.04 secs] 
2019-01-22T08:31:48.882-0500: 8.369: [Full GC (Metadata GC Threshold) [PSYoungGen: 5799K->0K(149504K)] [ParOldGen: 117940K->115617K(263680K)] 123739K->115617K(413184K), [Metaspace: 57889K->57857K(1099776K)], 0.2179918 secs] [Times: user=0.66 sys=0.01, real=0.21 secs] 
2019-01-22T08:31:56.805-0500: 16.291: [GC (Allocation Failure) [PSYoungGen: 131072K->4807K(189440K)] 246689K->120432K(453120K), 0.0092119 secs] [Times: user=0.03 sys=0.01, real=0.01 secs] 
2019-01-22T08:32:11.898-0500: 31.385: [GC (Allocation Failure) [PSYoungGen: 181447K->1713K(195072K)] 297072K->120453K(458752K), 0.0062305 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 
2019-01-22T08:32:26.904-0500: 46.391: [GC (Allocation Failure) [PSYoungGen: 178353K->981K(234496K)] 297093K->120609K(498176K), 0.0048011 secs] [Times: user=0.01 sys=0.00, real=0.00 secs] 
2019-01-22T08:32:47.815-0500: 67.302: [GC (Allocation Failure) [PSYoungGen: 223701K->1518K(241664K)] 343329K->121154K(505344K), 0.0102639 secs] [Times: user=0.03 sys=0.00, real=0.01 secs] 
2019-01-22T08:33:07.716-0500: 87.203: [GC (Allocation Failure) [PSYoungGen: 194483K->1385K(262144K)] 314119K->121029K(525824K), 0.0059916 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 
2019-01-22T08:33:11.599-0500: 91.086: [GC (Allocation Failure) [PSYoungGen: 127845K->1390K(268288K)] 247489K->140704K(1666560K), 0.0107712 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 
2019-01-22T08:33:11.610-0500: 91.097: [GC (Allocation Failure) [PSYoungGen: 1390K->1401K(294400K)] 140704K->140715K(1692672K), 0.0037587 secs] [Times: user=0.01 sys=0.01, real=0.01 secs] 
2019-01-22T08:33:11.614-0500: 91.100: [Full GC (Allocation Failure) [PSYoungGen: 1401K->0K(294400K)] [ParOldGen: 139314K->51057K(201728K)] 140715K->51057K(496128K), [Metaspace: 60831K->60827K(1101824K)], 0.0966803 secs] [Times: user=0.24 sys=0.01, real=0.09 secs] 
2019-01-22T08:33:11.712-0500: 91.199: [GC (Allocation Failure) [PSYoungGen: 0K->0K(293888K)] 51057K->51057K(1692160K), 0.0100144 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 
2019-01-22T08:33:11.723-0500: 91.209: [Full GC (Allocation Failure) [PSYoungGen: 0K->0K(293888K)] [ParOldGen: 51057K->48333K(224768K)] 51057K->48333K(518656K), [Metaspace: 60827K->60134K(1101824K)], 0.2302426 secs] [Times: user=0.67 sys=0.01, real=0.23 secs] 
Heap
 PSYoungGen      total 293888K, used 1071K [0x00000000d5580000, 0x00000000ee180000, 0x0000000100000000)
  eden space 275968K, 0% used [0x00000000d5580000,0x00000000d568bfb8,0x00000000e6300000)
  from space 17920K, 0% used [0x00000000e6300000,0x00000000e6300000,0x00000000e7480000)
  to   space 17408K, 0% used [0x00000000ed080000,0x00000000ed080000,0x00000000ee180000)
 ParOldGen       total 1398272K, used 48333K [0x0000000080000000, 0x00000000d5580000, 0x00000000d5580000)
  object space 1398272K, 3% used [0x0000000080000000,0x0000000082f335b0,0x00000000d5580000)
 Metaspace       used 60138K, capacity 60994K, committed 62464K, reserved 1101824K
  class space    used 9379K, capacity 9681K, committed 9984K, reserved 1048576K

worker.log.err

java.lang.OutOfMemoryError: Java heap space
Dumping heap to artifacts/heapdump ...
Heap dump file created [965011634 bytes in 9.400 secs]
java.lang.OutOfMemoryError: Java heap space
Dumping heap to artifacts/heapdump ...
Unable to create artifacts/heapdump: File exists
java.lang.OutOfMemoryError: Java heap space
Dumping heap to artifacts/heapdump ...
Unable to create artifacts/heapdump: File exists
java.lang.OutOfMemoryError: Java heap space
Dumping heap to artifacts/heapdump ...
.

robots.txt

User-agent: *
Crawl-delay: 10
# Directories

Ответы [ 2 ]

0 голосов
/ 24 января 2019

ОБНОВЛЕНИЕ: может быть, это был http.content.limit?У нас было установлено значение -1, потому что наш сборщик не извлекал всю страницу (из-за массивных меню в верхней части страницы одного из наших сайтов).Полностью отключить его, похоже, было ошибкой.С тех пор мы установили для него значение http.content.limit: 5000000 (5 МБ) и позволяем ему работать.Пока без ошибок ...

=============

Что мы должны искать в heapdump?(Я являюсь сотрудником an_snatcher) Я загрузил последний файл heapdump на свой локальный компьютер и запустил Eclipse Memory Analyzer против него.Я не знаю, как экспортировать данные из анализатора памяти, поэтому я опубликую изображения скриншотов того, что они нашли, в надежде, что вы сможете их интерпретировать.В основном это говорит о том, что

"com.digitalpebble.stormcrawler.bolt.FetcherBolt $ FetcherThread @ 0x8138adb0 FetcherThread # 27 Мелкий размер: 144 B Сохраненный размер: 709,4 МБ"

Вот изображения того, чтоEclipse Memory Analyzer говорит о файле heapdump:

Eclipse Memory Analyzer, изображение 01

Eclipse Memory Analyzer, изображение 02

Eclipse Memory Analyzer, изображение 03

Eclipse Memory Analyzer, изображение 04

Eclipse Memory Analyzer, изображение 05

Eclipse Memory Analyzer image 06

0 голосов
/ 24 января 2019

Вы пробовали анализировать дамп кучи с помощью JHat или VisualVM?

ОБНОВЛЕНИЕ В приведенной выше heapdump указывается, что память заполнена содержимым потоков извлечения. Тот факт, что вы не получаете этого при уменьшении лимита контента, подтвердит это. Используйте больше памяти, если вы можете или продолжаете ограничивать максимальную длину, у вас также может быть меньше потоков, работающих параллельно.

Примечание: если вы попадаете в бесконечный поток, например радио или видео, http по умолчанию просто продолжит загружать контент независимо от установленных ограничений. Реализация ohttp более надежна в этом отношении.

...