Я пытаюсь реализовать «Поиск похожих объектов» с помощью Hadoop Mapreduce.
И вот мой код водителя (для проверки)
Path inputPath = new Path(args[0]);
Path outputPath = new Path(args[1]);
Configuration shinglingConf = new Configuration();
Job jobShingling = new Job(shinglingConf, "Shingling");
jobShingling.setJarByClass(FindingSimilarItems.class);
//jobShingling.setMapperClass(Map.class);
jobShingling.setReducerClass(Reduce.class);
jobShingling.setOutputKeyClass(TextInputFormat.class);
jobShingling.setOutputValueClass(TextInputFormat.class);
for(int i = 1; i <= 101; i++) {
if(i < 10) MultipleInputs.addInputPath(jobShingling, new Path(args[0] + "/00" + Integer.toString(i) + ".txt"), TextInputFormat.class, Map.class);
else if(i >= 10 && i < 100)
MultipleInputs.addInputPath(jobShingling, new Path(args[0] + "/0" + Integer.toString(i) + ".txt"), TextInputFormat.class, Map.class);
else MultipleInputs.addInputPath(jobShingling, new Path(args[0] + "/" + Integer.toString(i) + ".txt"), TextInputFormat.class, Map.class);
}
FileOutputFormat.setOutputPath(jobShingling, outputPath);
jobShingling.waitForCompletion(true);
И я получил
Error: java.io.IOException: Initialization of all the collectors failed. Error in last collector was :class org.apache.hadoop.mapreduce.lib.input.TextInputFormat
at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:415)
at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:81)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:698)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
Caused by: java.lang.ClassCastException: class org.apache.hadoop.mapreduce.lib.input.TextInputFormat
at java.lang.Class.asSubclass(Class.java:3404)
at org.apache.hadoop.mapred.JobConf.getOutputKeyComparator(JobConf.java:881)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:1004)
at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:402)
... 9 more
Есть идеи, чтобы решить эту проблему?
Спасибо!