Я тестирую что-то на Худи на EMR, и я не могу понять, почему я получаю эту ошибку. Итак, у меня есть существующий набор данных худи, созданный ранее. В новой оболочке спарк-оболочки я попытался сделать вот так
df.write
.format("org.apache.hudi")
.option(DataSourceWriteOptions.OPERATION_OPT_KEY, DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL)
.options(hudiOptions)
И я получаю эту ошибку:
[Stage 18:> (0 + 2) / 2]20/04/29 15:54:49 WARN TaskSetManager: Lost task 1.0 in stage 18.0 (TID 4773, ip-10-0-140-35.ec2.internal, executor 30): java.lang.IllegalArgumentException: Wrong FS: s3://www.facebook.com/1000awesomethings, expected: s3://qadv2p0
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:669)
at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:487)
at com.amazon.ws.emr.hadoop.fs.staging.DefaultStagingMechanism.isStagingDirectoryPath(DefaultStagingMechanism.java:38)
at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:842)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1440)
at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.exists(EmrFileSystem.java:352)
at org.apache.hudi.common.io.storage.HoodieWrapperFileSystem.exists(HoodieWrapperFileSystem.java:459)
at org.apache.hudi.common.util.FSUtils.createPathIfNotExists(FSUtils.java:517)
at org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$ensurePartitionLoadedCorrectly$5(AbstractTableFileSystemView.java:221)
at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660)
at org.apache.hudi.common.table.view.AbstractTableFileSystemView.ensurePartitionLoadedCorrectly(AbstractTableFileSystemView.java:212)
at org.apache.hudi.common.table.view.AbstractTableFileSystemView.getLatestDataFilesBeforeOrOn(AbstractTableFileSystemView.java:351)
at org.apache.hudi.index.bloom.HoodieBloomIndex.lambda$loadInvolvedFiles$19c2c1bb$1(HoodieBloomIndex.java:247)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$1$1.apply(JavaRDDLike.scala:125)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$1$1.apply(JavaRDDLike.scala:125)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at scala.collection.AbstractIterator.to(Iterator.scala:1334)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1334)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1334)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Есть идеи почему?
Другой Вопрос в том, какой форум лучше всего задавать вопросы для худи?
Спасибо,