сборщик метрик ambari - PullRequest
       28

сборщик метрик ambari

0 голосов
/ 26 февраля 2020

у нас есть ambari с hadoop (HDP 2.6.4) кластером

в ambari у нас есть ambari-metrics-collector

, время от времени происходит сбой сборщика метрик, и мы можем см. в журнале следующее примечание

- это происходит в обеих конфигурациях - embedded и distributed

java.lang.InterruptedException
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:502)
        at org.apache.helix.controller.stages.ClusterEventBlockingQueue.take(ClusterEventBlockingQueue.java:85)
        at org.apache.helix.controller.GenericHelixController$ClusterEventProcessor.run(GenericHelixController.java:594)
2019-11-26 00:57:20,639 INFO org.apache.helix.controller.GenericHelixController: END ClusterEventProcessor thread
2019-11-26 00:57:20,641 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server master02.sys67.com/130.21.6.36:61181. Will not attempt to authenticate using SASL (unknown error)
2019-11-26 00:57:20,642 WARN org.apache.zookeeper.ClientCnxn: Session 0x16df2739ea90013 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
2019-11-26 00:57:20,686 INFO org.mortbay.log: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:6188
2019-11-26 00:57:20,690 WARN org.apache.hadoop.yarn.webapp.GenericExceptionHandler: INTERNAL_SERVER_ERROR
javax.ws.rs.WebApplicationException: org.apache.phoenix.exception.PhoenixIOException: Interrupted calling coprocessor service org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService for row \x00\x00METRIC_RECORD
        at org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.TimelineWebServices.getTimelineMetrics(TimelineWebServices.java:377)
        at sun.reflect.GeneratedMethodAccessor35.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
        at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
        at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
        at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
        at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
        at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
        at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
        at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
        at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
        at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
        at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
        at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
        at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
        at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
        at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:895)
        at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:843)
        at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:804)
        at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
        at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
        at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
        at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at 

org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
            at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
            at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1409)
            at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
            at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
        ... 48 more
Caused by: java.io.InterruptedIOException: Giving up trying to location region in meta: thread is interrupted.
        at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1347)
        at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1165)
        at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1149)
        at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1106)
        at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getRegionLocation(ConnectionManager.java:941)
        at org.apache.hadoop.hbase.client.HRegionLocator.getRegionLocation(HRegionLocator.java:83)

неясно, почему мы получаем

2019-11-26 00:57:20,690 WARN org.apache.hadoop.yarn.webapp.GenericExceptionHandler: INTERNAL_SERVER_ERROR

Я должен сказать, что мы настраиваем параметры коллектора метрик в соответствии с - https://cwiki.apache.org/confluence/display/AMBARI/Configurations+-+Tuning

, поэтому мы не понимаем, почему сбой сборщика метрик ambari

машины с включенным сборщиком метрик ambari следующие обороты

ambari-metrics-monitor-2.6.2.2-1.x86_64
ambari-metrics-common-2.6.2.2-1.noarch
ambari-agent-2.6.2.2-1.x86_64
ambari-metrics-collector-2.6.2.2-1.x86_64
ambari-metrics-hadoop-sink-2.6.2.2-1.x86_64
ambari-server-2.6.2.2-1.x86_64
...