PyArrow 0.15.1 ClassCastException - невозможно преобразовать в org. apache .had oop .shaded.com.google.protobuf.Message - PullRequest
0 голосов
/ 20 марта 2020

У меня проблемы с PyArrow 0.15.1 при попытке вывести содержимое каталога из HDFS.

PyArrow устанавливается внутри образа Ubuntu 18.04 docker.

Использование Hadoop 3.2.1 и openjdk-8-jdk

>>> import pyarrow as pa
>>> pa.__version__
'0.15.1'
>>> fs = pa.hdfs.connect(<ip>, <port>)
>>> fs.ls('/')
hdfsListDirectory(/): FileSystem#listStatus error:
ClassCastException: org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetListingRequestProto cannot be cast to org.apache.hadoop.shaded.com.google.protobuf.Messagejava.lang.ClassCastException: org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetListingRequestProto cannot be cast to org.apache.hadoop.shaded.com.google.protobuf.Message
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
        at com.sun.proxy.$Proxy9.getListing(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:674)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
        at com.sun.proxy.$Proxy10.getListing(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1647)
        at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1631)
        at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:1048)
        at org.apache.hadoop.hdfs.DistributedFileSystem.access$1000(DistributedFileSystem.java:131)
        at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1112)
        at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1109)
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:1119)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/varun.patil/anaconda3/lib/python3.7/site-packages/pyarrow/hdfs.py", line 103, in ls
    return super(HadoopFileSystem, self).ls(path, detail)
  File "pyarrow/io-hdfs.pxi", line 272, in pyarrow.lib.HadoopFileSystem.ls
  File "pyarrow/error.pxi", line 80, in pyarrow.lib.check_status
pyarrow.lib.ArrowIOError: HDFS list directory failed, errno: 255 (Unknown error 255) Please check that you are connecting to the correct HDFS RPC port

Я установил JAVA_HOME и HADOOP_HOME правильно.

Тот же код работает правильно с PyArrow 0.11.1, но мне нужно использовать 0.15.1.

...