зацикливание файлов без табуляции - PullRequest
0 голосов
/ 05 июня 2019

У меня есть несколько файлов без табуляции. Я хотел бы объединить их и создать один файл с некоторой информацией обо всех файлах.

Я пробовал этот код, но он не работает, когда я использую цикл для

Исходный файл похож на

Warning: Output file '02-MappedReads_HISAT2/sam_folder/SAMPLE01_unsorted_sample.sam' was specified without -S.  This will not work in future HISAT 2 versions.  Please use -S instead.
9437 reads; of these:
  9437 (100.00%) were paired; of these:
    310 (3.28%) aligned concordantly 0 times
    8977 (95.13%) aligned concordantly exactly 1 time
    150 (1.59%) aligned concordantly >1 times
    ----
    310 pairs aligned concordantly 0 times; of these:
      13 (4.19%) aligned discordantly 1 time
    ----
    297 pairs aligned 0 times concordantly or discordantly; of these:
      594 mates make up the pairs; of these:
        306 (51.52%) aligned 0 times
        282 (47.47%) aligned exactly 1 time
        6 (1.01%) aligned >1 times
98.38% overall alignment rate

Итак, я использовал read.table fuction для чтения файла:

(report_sample <- read.table(paste0(mapping_Folder, '/', 'SAMPLE01_summary.txt'), header = F, as.is = T, fill = TRUE, sep = ' ', skip = 1, blank.lines.skip = TRUE, text = TRUE))

(final <- data.frame('samples' = samples['1',1], 'Input_Read_Pairs' = report_sample[1,1], 'Mapped_reads' = report_sample[2,3], 'Mapped_reads_%' = report_sample[2,4], 'reads_unmapped' = report_sample[3,5], 'reads_unmapped_%' = report_sample[3,6], 'reads_uniquely_mapped' = report_sample[4,5], 'reads_uniquely_mapped_%' = report_sample[4,6]))

Итак, вывод такой образцы Input_Read_Pairs Mapped_reads Mapped_reads_. reads_unmapped reads_unmapped_. reads_uniquely_mapped reads_uniquely_mapped_. 1 SAMPLE01 9437 9437 (100,00%) 310 (3,28%) 8977 (95,13%)

нормально, когда я использую только один файл. Если я делаю с циклом не работает, а также

Итак, я использовал read.table fuction для чтения файла:

(report_sample <- read.table(paste0(mapping_Folder, '/', 'SAMPLE01_summary.txt'), header = F, as.is = T, fill = TRUE, sep = ' ', skip = 1, blank.lines.skip = TRUE, text = TRUE))

(final <- data.frame('samples' = samples['1',1], 'Input_Read_Pairs' = report_sample[1,1], 'Mapped_reads' = report_sample[2,3], 'Mapped_reads_%' = report_sample[2,4], 'reads_unmapped' = report_sample[3,5], 'reads_unmapped_%' = report_sample[3,6], 'reads_uniquely_mapped' = report_sample[4,5], 'reads_uniquely_mapped_%' = report_sample[4,6]))

Итак, вывод такой:

samples Input_Read_Pairs Mapped_reads Mapped_reads_. reads_unmapped reads_unmapped_. reads_uniquely_mapped reads_uniquely_mapped_.
1 SAMPLE01             9437         9437      (100.00%)            310          (3.28%)                  8977                (95.13%)

нормально, когда я использую только один файл. Если я делаю с циклом не работает, а

 report_sample <- array(dim = 0)
    for (i in samples[,1]) {
        report_sample[i] <- read.table(paste0(mapping_Folder, '/', i,'_summary.txt'), header = F, as.is = T, fill = TRUE, sep = ' ', skip = 1, blank.lines.skip = TRUE, text = TRUE, )
    }
    final <- data.frame('samples' = samples['1',1], 'Input_Read_Pairs' = report_sample[1,1], 'Mapped_reads' = report_sample[2,3], 'Mapped_reads_%' = report_sample[2,4], 'reads_unmapped' = report_sample[3,5], 'reads_unmapped_%' = report_sample[3,6], 'reads_uniquely_mapped' = report_sample[4,5], 'reads_uniquely_mapped_%' = report_sample[4,6])
$SAMPLE01
 [1] "9437"          ""              ""              ""              ""              ""              ""              "these:"       
 [9] ""              "time"          ""              ""              "discordantly;" ""              "pairs;"        ""             
[17] "0"             ""              "exactly"       ""              ">1"            "98.38%"       

$SAMPLE02
 [1] "9437"          ""              ""              ""              ""              ""              ""              "these:"       
 [9] ""              "time"          ""              ""              "discordantly;" ""              "pairs;"        ""             
[17] "0"             ""              "exactly"       ""              ">1"            "98.38%"       

$SAMPLE03
 [1] "9437"          ""              ""              ""              ""              ""              ""              "these:"       
 [9] ""              "time"          ""              ""              "discordantly;" ""              "pairs;"        ""             
[17] "0"             ""              "exactly"       ""              ">1"            "98.43%"       

1 Ответ

0 голосов
/ 05 июня 2019

Ваш пример не воспроизводится на 100% (что такое samples?), Поэтому я приблизился.

TidyTable <- function(x) {
  final <- data.frame('Input_Read_Pairs' = x[1,1], # add you "samples" before that
                      'Mapped_reads' = x[2,3], 
                      'Mapped_reads_%' = x[2,4], 
                      'reads_unmapped' = x[3,5], 
                      'reads_unmapped_%' = x[3,6], 
                      'reads_uniquely_mapped' = x[4,5], 
                      'reads_uniquely_mapped_%' = x[4,6])
  return(final)
}

report_sample <- list()
for (i in 1:3) { # change this to your "samples"
  report_sample[[i]] <- read.table(paste0(mapping_Folder, '/', "output", i,".txt"), 
                             header = F, as.is = T, fill = TRUE, sep = ' ', 
                             skip = 1, blank.lines.skip = TRUE, text = TRUE, )
}

df <- lapply(report_sample, FUN = function(x) TidyTable(x))
do.call("rbind", df)
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...