Я работаю над примером умножения матриц, используя mapreduce в hadoop. Я хочу спросить, что разлитые записи всегда должны быть равны записям mapinput и mapoutput.
Я пролил записи, отличные от записей mapinput и mapoutput
вот вывод одного из тестов, которые я получаю:
Three by three test
IB = 1
KB = 2
JB = 1
11/12/14 13:16:22 INFO input.FileInputFormat: Total input paths to process : 2
11/12/14 13:16:22 INFO mapred.JobClient: Running job: job_201112141153_0003
11/12/14 13:16:23 INFO mapred.JobClient: map 0% reduce 0%
11/12/14 13:16:32 INFO mapred.JobClient: map 100% reduce 0%
11/12/14 13:16:44 INFO mapred.JobClient: map 100% reduce 100%
11/12/14 13:16:46 INFO mapred.JobClient: Job complete: job_201112141153_0003
11/12/14 13:16:46 INFO mapred.JobClient: Counters: 17
11/12/14 13:16:46 INFO mapred.JobClient: Job Counters
11/12/14 13:16:46 INFO mapred.JobClient: Launched reduce tasks=1
11/12/14 13:16:46 INFO mapred.JobClient: Launched map tasks=2
11/12/14 13:16:46 INFO mapred.JobClient: Data-local map tasks=2
11/12/14 13:16:46 INFO mapred.JobClient: FileSystemCounters
11/12/14 13:16:46 INFO mapred.JobClient: FILE_BYTES_READ=1464
11/12/14 13:16:46 INFO mapred.JobClient: HDFS_BYTES_READ=528
11/12/14 13:16:46 INFO mapred.JobClient: FILE_BYTES_WRITTEN=2998
11/12/14 13:16:46 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=384
11/12/14 13:16:46 INFO mapred.JobClient: Map-Reduce Framework
11/12/14 13:16:46 INFO mapred.JobClient: Reduce input groups=36
11/12/14 13:16:46 INFO mapred.JobClient: Combine output records=0
11/12/14 13:16:46 INFO mapred.JobClient: Map input records=18
11/12/14 13:16:46 INFO mapred.JobClient: Reduce shuffle bytes=735
11/12/14 13:16:46 INFO mapred.JobClient: Reduce output records=15
11/12/14 13:16:46 INFO mapred.JobClient: Spilled Records=108
11/12/14 13:16:46 INFO mapred.JobClient: Map output bytes=1350
11/12/14 13:16:46 INFO mapred.JobClient: Combine input records=0
11/12/14 13:16:46 INFO mapred.JobClient: Map output records=54
11/12/14 13:16:46 INFO mapred.JobClient: Reduce input records=54
11/12/14 13:16:46 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
11/12/14 13:16:46 INFO input.FileInputFormat: Total input paths to process : 1
11/12/14 13:16:46 INFO mapred.JobClient: Running job: job_local_0001
11/12/14 13:16:46 INFO input.FileInputFormat: Total input paths to process : 1
11/12/14 13:16:46 INFO mapred.MapTask: io.sort.mb = 100
11/12/14 13:16:46 INFO mapred.MapTask: data buffer = 79691776/99614720
11/12/14 13:16:46 INFO mapred.MapTask: record buffer = 262144/327680
11/12/14 13:16:46 INFO mapred.MapTask: Starting flush of map output
11/12/14 13:16:46 INFO mapred.MapTask: Finished spill 0
11/12/14 13:16:46 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
11/12/14 13:16:46 INFO mapred.LocalJobRunner:
11/12/14 13:16:46 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done.
11/12/14 13:16:46 INFO mapred.LocalJobRunner:
11/12/14 13:16:46 INFO mapred.Merger: Merging 1 sorted segments
11/12/14 13:16:46 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 128 bytes
11/12/14 13:16:46 INFO mapred.LocalJobRunner:
11/12/14 13:16:46 INFO mapred.TaskRunner: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
11/12/14 13:16:46 INFO mapred.LocalJobRunner:
11/12/14 13:16:46 INFO mapred.TaskRunner: Task attempt_local_0001_r_000000_0 is allowed to commit now
11/12/14 13:16:46 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_r_000000_0' to hdfs://localhost:9000/tmp/MatrixMultiply/out
11/12/14 13:16:46 INFO mapred.LocalJobRunner: reduce > reduce
11/12/14 13:16:46 INFO mapred.TaskRunner: Task 'attempt_local_0001_r_000000_0' done.
11/12/14 13:16:47 INFO mapred.JobClient: map 100% reduce 100%
11/12/14 13:16:47 INFO mapred.JobClient: Job complete: job_local_0001
11/12/14 13:16:47 INFO mapred.JobClient: Counters: 14
11/12/14 13:16:47 INFO mapred.JobClient: FileSystemCounters
11/12/14 13:16:47 INFO mapred.JobClient: FILE_BYTES_READ=89412
11/12/14 13:16:47 INFO mapred.JobClient: HDFS_BYTES_READ=37206
11/12/14 13:16:47 INFO mapred.JobClient: FILE_BYTES_WRITTEN=37390
11/12/14 13:16:47 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=164756
11/12/14 13:16:47 INFO mapred.JobClient: Map-Reduce Framework
11/12/14 13:16:47 INFO mapred.JobClient: Reduce input groups=9
11/12/14 13:16:47 INFO mapred.JobClient: Combine output records=9
11/12/14 13:16:47 INFO mapred.JobClient: Map input records=15
11/12/14 13:16:47 INFO mapred.JobClient: Reduce shuffle bytes=0
11/12/14 13:16:47 INFO mapred.JobClient: Reduce output records=9
11/12/14 13:16:47 INFO mapred.JobClient: Spilled Records=18
11/12/14 13:16:47 INFO mapred.JobClient: Map output bytes=180
11/12/14 13:16:47 INFO mapred.JobClient: Combine input records=15
11/12/14 13:16:47 INFO mapred.JobClient: Map output records=15
11/12/14 13:16:47 INFO mapred.JobClient: Reduce input records=9
...........X[0][0]=30, Y[0][0]=9
Bad Answer
...........X[0][1]=36, Y[0][1]=36
...........X[0][2]=42, Y[0][2]=42
...........X[1][0]=66, Y[1][0]=24
Bad Answer
...........X[1][1]=81, Y[1][1]=81
...........X[1][2]=96, Y[1][2]=96
...........X[2][0]=102, Y[2][0]=39
Bad Answer
...........X[2][1]=126, Y[2][1]=126
...........X[2][2]=150, Y[2][2]=150
Этот пример описан здесь вместе с кодом:
http://www.norstad.org/matrix-multiply/index.html
Подскажите, пожалуйста, в чем проблема, и как я могу ее исправить? Спасибо
WL