I am trying to read a csv (native) file from an S3 bucket using a locally running Spark - Scala. I am able to read the file using the http protocol but I intend to use the s3a protocol.
Below is the configuration setup before the call
spark.sparkContext.hadoopConfiguration.set ("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem") spark.sparkContext.hadoopConfiguration.set ("fs.s3a.access.key "," Mykey ") spark.sparkContext.hadoopConfiguration.set (" fs.s3a.secret.key "," Mysecretkey ") spark.sparkContext.hadoopConfiguration.set (" fs.s3a.aws.credentials.provider ","org.apache.hadoop.fs.s3a.BasicAWSCredentialsProvider");spark.sparkContext.hadoopConfiguration.set ("com.amazonaws.services.s3.enableV4", "true") spark.sparkContext.hadoopConfiguration.set ("fs.s3a.endpoint", "eu-west-1.amazonaws.com") spark.sparkContext.hadoopConfiguration.set (" fs.s3a.impl.disable.cache "," true ")
I am getting bellow exception:
1. Exception in thread "main" java.lang.RuntimeException:
java.lang.ClassNotFoundException: Class
org.apache.hadoop.fs.s3a.S3AFileSystem not found at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2154)
at
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2580)
my spark version is: 2.3.1
scala version: 2.11
aws-java-sdk vesrion : 1.11.336
hadoop-aws :2.8.4