когда я пытаюсь использовать массовую загрузку для проверки загрузки данных объема, консоль выдавала сообщение об ошибке:
10:17:01 ERROR org.apache.spark.executor.Executor - Exception in task 0.0 in stage 5.0 (TID 3)
java.util.NoSuchElementException
at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.next(DefaultTraversal.java:204)
at org.apache.tinkerpop.gremlin.process.computer.bulkloading.BulkLoader.getVertexById(BulkLoader.java:116)
at org.apache.tinkerpop.gremlin.process.computer.bulkloading.BulkLoaderVertexProgram.lambda$executeInternal$4(BulkLoaderVertexProgram.java:251)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at org.apache.tinkerpop.gremlin.process.computer.bulkloading.BulkLoaderVertexProgram.executeInternal(BulkLoaderVertexProgram.java:249)
at org.apache.tinkerpop.gremlin.process.computer.bulkloading.BulkLoaderVertexProgram.execute(BulkLoaderVertexProgram.java:197)
at org.apache.tinkerpop.gremlin.spark.process.computer.SparkExecutor.lambda$null$5(SparkExecutor.java:118)
at org.apache.tinkerpop.gremlin.util.iterator.IteratorUtils$3.next(IteratorUtils.java:247)
at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42)
at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:189)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:64)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
10:17:01 WARN org.apache.spark.scheduler.TaskSetManager - Lost task 0.0 in stage 5.0 (TID 3, localhost): java.util.NoSuchElementException
at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.next(DefaultTraversal.java:204)
at org.apache.tinkerpop.gremlin.process.computer.bulkloading.BulkLoader.getVertexById(BulkLoader.java:116)
at org.apache.tinkerpop.gremlin.process.computer.bulkloading.BulkLoaderVertexProgram.lambda$executeInternal$4(BulkLoaderVertexProgram.java:251)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at org.apache.tinkerpop.gremlin.process.computer.bulkloading.BulkLoaderVertexProgram.executeInternal(BulkLoaderVertexProgram.java:249)
at org.apache.tinkerpop.gremlin.process.computer.bulkloading.BulkLoaderVertexProgram.execute(BulkLoaderVertexProgram.java:197)
at org.apache.tinkerpop.gremlin.spark.process.computer.SparkExecutor.lambda$null$5(SparkExecutor.java:118)
at org.apache.tinkerpop.gremlin.util.iterator.IteratorUtils$3.next(IteratorUtils.java:247)
at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42)
at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:189)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:64)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
10:17:01 ERROR org.apache.spark.scheduler.TaskSetManager - Task 0 in stage 5.0 failed 1 times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 5.0 failed 1 times, most recent failure: Lost task 0.0 in stage 5.0 (TID 3, localhost): java.util.NoSuchElementException
at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.next(DefaultTraversal.java:204)
at org.apache.tinkerpop.gremlin.process.computer.bulkloading.BulkLoader.getVertexById(BulkLoader.java:116)
at org.apache.tinkerpop.gremlin.process.computer.bulkloading.BulkLoaderVertexProgram.lambda$executeInternal$4(BulkLoaderVertexProgram.java:251)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at org.apache.tinkerpop.gremlin.process.computer.bulkloading.BulkLoaderVertexProgram.executeInternal(BulkLoaderVertexProgram.java:249)
at org.apache.tinkerpop.gremlin.process.computer.bulkloading.BulkLoaderVertexProgram.execute(BulkLoaderVertexProgram.java:197)
at org.apache.tinkerpop.gremlin.spark.process.computer.SparkExecutor.lambda$null$5(SparkExecutor.java:118)
at org.apache.tinkerpop.gremlin.util.iterator.IteratorUtils$3.next(IteratorUtils.java:247)
at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42)
at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:189)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:64)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Я настроил hadoop-mine.properties следующим образом:
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader=org.apache.tinkerpop.gremlin.hadoop.structure.io.script.ScriptInputFormat
gremlin.hadoop.graphWriter=org.apache.tinkerpop.gremlin.hadoop.structure.io.graphson.GraphSONOutputFormat
gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=data/tinkerpop-classic-edge.txt
gremlin.hadoop.scriptInputFormat.script=data/script-process-e.groovy
#/user/sera/data/script-process-lines.groovy
gremlin.hadoop.outputLocation=/tmp
#####################################
# GiraphGraphComputer Configuration #
#####################################
##here change workers
giraph.minWorkers=1
giraph.maxWorkers=1
giraph.useOutOfCoreGraph=true
giraph.useOutOfCoreMessages=true
mapred.map.child.java.opts=-Xmx1024m
mapred.reduce.child.java.opts=-Xmx1024m
giraph.numInputThreads=4
giraph.numComputeThreads=4
# giraph.maxPartitionsInMemory=1
# giraph.userPartitionCount=2
####################################
# SparkGraphComputer Configuration #
####################################
spark.master=local[*]
spark.executor.memory=512m
spark.serializer=org.apache.spark.serializer.KryoSerializer
# spark.kryo.registrationRequired=true
# spark.storage.memoryFraction=0.2
# spark.eventLog.enabled=true
# spark.eventLog.dir=/tmp/spark-event-logs
# spark.ui.killEnabled=true
И набор данных Файл tinkerpop-classic-edge.txt выглядит следующим образом:
knows:1:2|11111122222,friend
knows:2:3|22222233333,mate
knows:1:5|11111155555,father
knows:3:2|12314125125,neither
knows:5:1|23124125151,back
knows:2:1|91240251633,back
Функция разбора здесь:
def parse(line,factory){
def part = line.split(/\|/);
def (label, startId, targetId) = part[0].split(/:/).toList()
def v1 = factory.vertex(Long.valueOf(startId),"uid");
def v2 = factory.vertex(Long.valueOf(targetId),"uid");
def edge = factory.edge(v1,v2,label);
if(part.size() == 1){
return v1;
}
def (phoneNum, type) = part[1].split(/,/).toList()
if(phoneNum != null){
edge.property("phoneNum",phoneNum);
}
if(type != null){
edge.property("type",type);
}
return v1;
}
Консольный ввод:
gremlin> graph = GraphFactory.open("conf/hadoop-graph/hadoop-mine.properties")
==>hadoopgraph[scriptinputformat->graphsonoutputformat]
gremlin> blvp = BulkLoaderVertexProgram.build().
bulkLoader(OneTimeBulkLoader).
writeGraph("conf/janusgraph-cassandra-es.properties").
create(graph)
==>BulkLoaderVertexProgram[bulkLoader=OneTimeBulkLoader, vertexIdProperty=null,
userSuppliedIds=false, keepOriginalIds=false, batchSize=0]
gremlin> graph.compute(SparkGraphComputer).program(blvp).submit().get()
Я хочу узнать, что я что-то проигнорировал?и как я могу решить эту проблему!большое спасибо!