awk
решение, которое работает, даже если файлы CSV не в порядке временных меток и не в порядке вообще !!!
ВХОД:
$ more csv_list.input
May 1 09:00 ./archive/xxx_cs_app_gmas_reject_MDM_20180501090001.csv 0.000 2 ✔
May 1 17:45 ./archive/xxx_cs_app_gmas_reject_MDM_20180501174500.csv 0.055 185 ✈
May 1 12:00 ./archive/xxx_uvw_ABC_20180501120001.csv 0.000 3 ✒
May 2 12:00 ./archive/xxx_123_20180502120001.csv 0.000 3 ✒
May 1 18:45 ./archive/xxx_uvw_ABC_20180501184500.csv 0.055 135 ✕
May 1 19:45 ./archive/xxx_456_20180501194500.csv 0.055 135 ✕
AWK 1-LINER CMD:
awk '{tmp=$4;gsub(/_[0-9]{14}\.csv/,"",$4);a[$4]+=$6;sub(/\.csv$/,"",tmp); tmp=substr(tmp,length(tmp)-13, length(tmp));if(!timestamp[$4] || tmp>timestamp[$4]){timestamp[$4]=tmp;line1[$4]=$1 OFS $2 OFS $3; line2[$4]=$5; line3[$4]=$7};}END{for(i in a){print line1[i] OFS i"_"timestamp[i]".csv" OFS line2[i] OFS a[i] OFS line3[i]}}' csv_list.input
СЦЕНАРИЙ И ПОЯСНЕНИЯ AWK:
# gawk profile, created Wed May 2 15:00:50 2018
# Rule(s)
{
tmp = $4
gsub(/_[0-9]{14}\.csv/, "", $4) #find the filename without timestamp
a[$4] += $6 #sum the 6th column value, key=filename without timestamp
sub(/\.csv$/, "", tmp) #remove the .csv
tmp = substr(tmp, length(tmp) - 13, length(tmp)) # get the timestamp of the file
if (! timestamp[$4] || tmp > timestamp[$4]) { # if the element is empty or if the new timesptamp is bigger than the previous one
timestamp[$4] = tmp #save the new timestamp
line1[$4] = $1 OFS $2 OFS $3 #save the 3 first columns of the latest file
line2[$4] = $5 # save the 5th column
line3[$4] = $7 # save the 6th column
}
}
# END rule(s)
END {
for (i in a) { #recombine the information to generate the ouput
print line1[i] OFS i "_" timestamp[i] ".csv" OFS line2[i] OFS a[i] OFS line3[i]
}
}
ВЫВОД:
May 1 17:45 ./archive/xxx_cs_app_gmas_reject_MDM_20180501174500.csv 0.055 187 ✈
May 2 12:00 ./archive/xxx_123_20180502120001.csv 0.000 3 ✒
May 1 18:45 ./archive/xxx_uvw_ABC_20180501184500.csv 0.055 138 ✕
May 1 19:45 ./archive/xxx_456_20180501194500.csv 0.055 135 ✕