У меня есть 3 CSV-файла, как показано ниже, я пытаюсь создать RDD и объединить RDD в окончательный вывод, к которому я могу применить фильтры. Я не уверен, с чего начать
с этим. Любые предложения, пожалуйста?
JavaRDD<String> file1 = sc.textFile("D:\\tmp\\file1.csv");
JavaRDD<String> file2 = sc.textFile("D:\\tmp\\file2.csv");
JavaRDD<String> file3 = sc.textFile("D:\\tmp\\file3.csv");
JavaRDD<String> combRDD = file1.union(file2).union(file3); //doesn't give expected output
csv file1
"user","source_ip","action","type"
"abc","10.0.0.1","login","ONE"
"xyz","10.0.1.1","login","ONE"
"abc","10.0.0.1","playing","ONE"
"def","10.1.0.1","login","ONE"
CSV файл2
"user","url","type"
"abc","/test","TWO"
"xyz","/wonder","TWO"
csv file3
"user","total_time","type","status"
"abc","5min","THREE","true"
"xyz","2min","THREE","fail"
Окончательный ожидаемый результат
"user","source_ip","action","type","url","total_time","status"
"abc","10.0.0.1","login","ONE","","",""
"xyz","10.0.1.1","login","ONE","","",""
"abc","10.0.0.1","playing","ONE","","",""
"def","10.1.0.1","login","ONE","","",""
"abc","","","TWO","/test","",""
"xyz","","","TWO","/wonder","",""
"abc","","","THREE","","5min","true"
"xyz","","","THREE","","2min","fail"
Каждый из файлов CSV генерируется каждый день в одном и том же формате, поэтому я хотел бы прочитать их из определенной папки с * .csv для построения RDD