Я пишу конвейер RNA-seq, используя Snakemake в течение недели. Я до сих пор не знаю рабочий порядок. Версия snakemake - 5.4.4
.
Мой конвейер RNA-seq состоит из пяти частей, поэтому я пишу пять правил (усечение правила, правило alignemnt, правило sort_to_bam, правило fpkm, количество правил). Когда я пишу правило, я проверю его, выполнив его。 И наконец я закончил. И он работает нормально, когда я проверяю каждое правило шаг за шагом. Вот мой Snakefile:
SBT=["wt1","wt2","epcr1","epcr2"]
ruleorder: trim > map > sort2bam > fpkm > count
rule all:
input:
expand("02_clean/{nico}_1.paired.fq.gz", nico=SBT),
expand("02_clean/{nico}_2.paired.fq.gz", nico=SBT),
expand("03_align/{nico}.sam", nico=SBT),
expand("04_exp/{nico}_count.txt", nico=SBT),
expand("05_ft/{nico}_gene.gtf", nico=SBT),
expand("05_ft/{nico}_transcript.gtf", nico=SBT)
rule trim:
input:
"01_raw/{nico}_1.fastq",
"01_raw/{nico}_2.fastq"
output:
"02_clean/{nico}_1.paired.fq.gz",
"02_clean/{nico}_1.unpaired.fq.gz",
"02_clean/{nico}_2.paired.fq.gz",
"02_clean/{nico}_2.unpaired.fq.gz",
threads: 20
shell:
"java -jar /software/Trimmomatic-0.36/trimmomatic-0.36.jar PE -threads 16 {input[0]} {input[1]} {output[0]} {output[1]} {output[2]} {output[3]} ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 &"
rule map:
input:
"02_clean/{nico}_1.paired.fq.gz",
"02_clean/{nico}_2.paired.fq.gz"
output:
"03_align/{nico}.sam"
log:
"logs/map/{nico}.log"
threads: 40
shell:
"hisat2 -p 20 --dta -x /root/s/r/p/A_th/WT-Al_VS_WT-CK/index/tair10 -1 {input[0]} -2 {input[1]} -S {output} >{log} 2>&1 &"
rule sort2bam:
input:
"03_align/{nico}.sam"
output:
"03_align/{nico}.bam"
threads:20
shell:
"samtools sort -@ 20 -m 10G -o {output} {input}"
rule count:
input:
"03_align/{nico}.bam"
output:
"04_exp/{nico}_count.txt"
shell:
"featureCounts -T 20 -p -t exon -g gene_id -a /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -o {output} {input}"
rule fpkm:
input:
"03_align/{nico}.bam"
output:
"05_ft/{nico}_gene.gtf",
"05_ft/{nico}_transcript.gtf"
threads: 40
shell:
"stringtie -e -p 30 -G /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -A {output[0]} -o {output[1]} {input}"
raw_data отображается следующим образом:
(py3) root@SBT:~/s/r/snakemake/my_rnaseq_data/01_raw# tree
.
|-- epcr1_1.fastq
|-- epcr1_2.fastq
|-- epcr2_1.fastq
|-- epcr2_2.fastq
|-- wt1_1.fastq
|-- wt1_2.fastq
|-- wt2_1.fastq
`-- wt2_2.fastq
Затем я хочу протестировать конвейер из raw_data, шаг за шагом удаляя все существующие промежуточные файлы, которые я тестировал ранее. Вот мои результаты dry_run:
Building DAG of jobs...
Job counts:
count jobs
1 all
4 count
4 fpkm
4 map
4 sort2bam
4 trim
21
[Tue Apr 30 03:09:28 2019]
rule trim:
input: 01_raw/epcr1_1.fastq, 01_raw/epcr1_2.fastq
output: 02_clean/epcr1_1.paired.fq.gz, 02_clean/epcr1_1.unpaired.fq.gz, 02_clean/epcr1_2.paired.fq.gz, 02_clean/epcr1_2.unpaired.fq.gz
jobid: 3
wildcards: nico=epcr1
java -jar /software/Trimmomatic-0.36/trimmomatic-0.36.jar PE -threads 16 01_raw/epcr1_1.fastq 01_raw/epcr1_2.fastq 02_clean/epcr1_1.paired.fq.gz 02_clean/epcr1_1.unpaired.fq.gz 02_clean/epcr1_2.paired.fq.gz 02_clean/epcr1_2.unpaired.fq.gz ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 &
[Tue Apr 30 03:09:28 2019]
rule trim:
input: 01_raw/epcr2_1.fastq, 01_raw/epcr2_2.fastq
output: 02_clean/epcr2_1.paired.fq.gz, 02_clean/epcr2_1.unpaired.fq.gz, 02_clean/epcr2_2.paired.fq.gz, 02_clean/epcr2_2.unpaired.fq.gz
jobid: 4
wildcards: nico=epcr2
java -jar /software/Trimmomatic-0.36/trimmomatic-0.36.jar PE -threads 16 01_raw/epcr2_1.fastq 01_raw/epcr2_2.fastq 02_clean/epcr2_1.paired.fq.gz 02_clean/epcr2_1.unpaired.fq.gz 02_clean/epcr2_2.paired.fq.gz 02_clean/epcr2_2.unpaired.fq.gz ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 &
[Tue Apr 30 03:09:28 2019]
rule trim:
input: 01_raw/wt1_1.fastq, 01_raw/wt1_2.fastq
output: 02_clean/wt1_1.paired.fq.gz, 02_clean/wt1_1.unpaired.fq.gz, 02_clean/wt1_2.paired.fq.gz, 02_clean/wt1_2.unpaired.fq.gz
jobid: 1
wildcards: nico=wt1
java -jar /software/Trimmomatic-0.36/trimmomatic-0.36.jar PE -threads 16 01_raw/wt1_1.fastq 01_raw/wt1_2.fastq 02_clean/wt1_1.paired.fq.gz 02_clean/wt1_1.unpaired.fq.gz 02_clean/wt1_2.paired.fq.gz 02_clean/wt1_2.unpaired.fq.gz ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 &
[Tue Apr 30 03:09:28 2019]
rule trim:
input: 01_raw/wt2_1.fastq, 01_raw/wt2_2.fastq
output: 02_clean/wt2_1.paired.fq.gz, 02_clean/wt2_1.unpaired.fq.gz, 02_clean/wt2_2.paired.fq.gz, 02_clean/wt2_2.unpaired.fq.gz
jobid: 2
wildcards: nico=wt2
java -jar /software/Trimmomatic-0.36/trimmomatic-0.36.jar PE -threads 16 01_raw/wt2_1.fastq 01_raw/wt2_2.fastq 02_clean/wt2_1.paired.fq.gz 02_clean/wt2_1.unpaired.fq.gz 02_clean/wt2_2.paired.fq.gz 02_clean/wt2_2.unpaired.fq.gz ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 &
[Tue Apr 30 03:09:28 2019]
rule map:
input: 02_clean/wt1_1.paired.fq.gz, 02_clean/wt1_2.paired.fq.gz
output: 03_align/wt1.sam
log: logs/map/wt1.log
jobid: 5
wildcards: nico=wt1
hisat2 -p 20 --dta -x /root/s/r/p/A_th/WT-Al_VS_WT-CK/index/tair10 -1 02_clean/wt1_1.paired.fq.gz -2 02_clean/wt1_2.paired.fq.gz -S 03_align/wt1.sam >logs/map/wt1.log 2>&1 &
[Tue Apr 30 03:09:28 2019]
rule map:
input: 02_clean/epcr2_1.paired.fq.gz, 02_clean/epcr2_2.paired.fq.gz
output: 03_align/epcr2.sam
log: logs/map/epcr2.log
jobid: 8
wildcards: nico=epcr2
hisat2 -p 20 --dta -x /root/s/r/p/A_th/WT-Al_VS_WT-CK/index/tair10 -1 02_clean/epcr2_1.paired.fq.gz -2 02_clean/epcr2_2.paired.fq.gz -S 03_align/epcr2.sam >logs/map/epcr2.log 2>&1 &
[Tue Apr 30 03:09:28 2019]
rule map:
input: 02_clean/wt2_1.paired.fq.gz, 02_clean/wt2_2.paired.fq.gz
output: 03_align/wt2.sam
log: logs/map/wt2.log
jobid: 6
wildcards: nico=wt2
hisat2 -p 20 --dta -x /root/s/r/p/A_th/WT-Al_VS_WT-CK/index/tair10 -1 02_clean/wt2_1.paired.fq.gz -2 02_clean/wt2_2.paired.fq.gz -S 03_align/wt2.sam >logs/map/wt2.log 2>&1 &
[Tue Apr 30 03:09:28 2019]
rule map:
input: 02_clean/epcr1_1.paired.fq.gz, 02_clean/epcr1_2.paired.fq.gz
output: 03_align/epcr1.sam
log: logs/map/epcr1.log
jobid: 7
wildcards: nico=epcr1
hisat2 -p 20 --dta -x /root/s/r/p/A_th/WT-Al_VS_WT-CK/index/tair10 -1 02_clean/epcr1_1.paired.fq.gz -2 02_clean/epcr1_2.paired.fq.gz -S 03_align/epcr1.sam >logs/map/epcr1.log 2>&1 &
[Tue Apr 30 03:09:28 2019]
rule sort2bam:
input: 03_align/epcr1.sam
output: 03_align/epcr1.bam
jobid: 19
wildcards: nico=epcr1
samtools sort -@ 20 -m 10G -o 03_align/epcr1.bam 03_align/epcr1.sam
[Tue Apr 30 03:09:28 2019]
rule sort2bam:
input: 03_align/epcr2.sam
output: 03_align/epcr2.bam
jobid: 20
wildcards: nico=epcr2
samtools sort -@ 20 -m 10G -o 03_align/epcr2.bam 03_align/epcr2.sam
[Tue Apr 30 03:09:28 2019]
rule sort2bam:
input: 03_align/wt1.sam
output: 03_align/wt1.bam
jobid: 17
wildcards: nico=wt1
samtools sort -@ 20 -m 10G -o 03_align/wt1.bam 03_align/wt1.sam
[Tue Apr 30 03:09:28 2019]
rule sort2bam:
input: 03_align/wt2.sam
output: 03_align/wt2.bam
jobid: 18
wildcards: nico=wt2
samtools sort -@ 20 -m 10G -o 03_align/wt2.bam 03_align/wt2.sam
[Tue Apr 30 03:09:28 2019]
rule count:
input: 03_align/wt2.bam
output: 04_exp/wt2_count.txt
jobid: 10
wildcards: nico=wt2
featureCounts -T 20 -p -t exon -g gene_id -a /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -o 04_exp/wt2_count.txt 03_align/wt2.bam
[Tue Apr 30 03:09:28 2019]
rule count:
input: 03_align/wt1.bam
output: 04_exp/wt1_count.txt
jobid: 9
wildcards: nico=wt1
featureCounts -T 20 -p -t exon -g gene_id -a /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -o 04_exp/wt1_count.txt 03_align/wt1.bam
[Tue Apr 30 03:09:28 2019]
rule count:
input: 03_align/epcr2.bam
output: 04_exp/epcr2_count.txt
jobid: 12
wildcards: nico=epcr2
featureCounts -T 20 -p -t exon -g gene_id -a /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -o 04_exp/epcr2_count.txt 03_align/epcr2.bam
[Tue Apr 30 03:09:28 2019]
rule fpkm:
input: 03_align/wt1.bam
output: 05_ft/wt1_gene.gtf, 05_ft/wt1_transcript.gtf
jobid: 13
wildcards: nico=wt1
stringtie -e -p 30 -G /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -A 05_ft/wt1_gene.gtf -o 05_ft/wt1_transcript.gtf 03_align/wt1.bam
[Tue Apr 30 03:09:28 2019]
rule fpkm:
input: 03_align/epcr2.bam
output: 05_ft/epcr2_gene.gtf, 05_ft/epcr2_transcript.gtf
jobid: 16
wildcards: nico=epcr2
stringtie -e -p 30 -G /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -A 05_ft/epcr2_gene.gtf -o 05_ft/epcr2_transcript.gtf 03_align/epcr2.bam
[Tue Apr 30 03:09:28 2019]
rule fpkm:
input: 03_align/wt2.bam
output: 05_ft/wt2_gene.gtf, 05_ft/wt2_transcript.gtf
jobid: 14
wildcards: nico=wt2
stringtie -e -p 30 -G /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -A 05_ft/wt2_gene.gtf -o 05_ft/wt2_transcript.gtf 03_align/wt2.bam
[Tue Apr 30 03:09:28 2019]
rule count:
input: 03_align/epcr1.bam
output: 04_exp/epcr1_count.txt
jobid: 11
wildcards: nico=epcr1
featureCounts -T 20 -p -t exon -g gene_id -a /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -o 04_exp/epcr1_count.txt 03_align/epcr1.bam
[Tue Apr 30 03:09:28 2019]
rule fpkm:
input: 03_align/epcr1.bam
output: 05_ft/epcr1_gene.gtf, 05_ft/epcr1_transcript.gtf
jobid: 15
wildcards: nico=epcr1
stringtie -e -p 30 -G /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -A 05_ft/epcr1_gene.gtf -o 05_ft/epcr1_transcript.gtf 03_align/epcr1.bam
[Tue Apr 30 03:09:28 2019]
localrule all:
input: 02_clean/wt1_1.paired.fq.gz, 02_clean/wt2_1.paired.fq.gz, 02_clean/epcr1_1.paired.fq.gz, 02_clean/epcr2_1.paired.fq.gz, 02_clean/wt1_2.paired.fq.gz, 02_clean/wt2_2.paired.fq.gz, 02_clean/epcr1_2.paired.fq.gz, 02_clean/epcr2_2.paired.fq.gz, 03_align/wt1.sam, 03_align/wt2.sam, 03_align/epcr1.sam, 03_align/epcr2.sam, 04_exp/wt1_count.txt, 04_exp/wt2_count.txt, 04_exp/epcr1_count.txt, 04_exp/epcr2_count.txt, 05_ft/wt1_gene.gtf, 05_ft/wt2_gene.gtf, 05_ft/epcr1_gene.gtf, 05_ft/epcr2_gene.gtf, 05_ft/wt1_transcript.gtf, 05_ft/wt2_transcript.gtf, 05_ft/epcr1_transcript.gtf, 05_ft/epcr2_transcript.gtf
jobid: 0
Job counts:
count jobs
1 all
4 count
4 fpkm
4 map
4 sort2bam
4 trim
21
This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.
Но когда я действительно выполняю его, он выдает сообщение об ошибке при выполнении правила sort2bam:
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 all
4 count
4 fpkm
4 map
4 sort2bam
4 trim
21
[Tue Apr 30 03:11:57 2019]
rule trim:
input: 01_raw/epcr1_1.fastq, 01_raw/epcr1_2.fastq
output: 02_clean/epcr1_1.paired.fq.gz, 02_clean/epcr1_1.unpaired.fq.gz, 02_clean/epcr1_2.paired.fq.gz, 02_clean/epcr1_2.unpaired.fq.gz
jobid: 3
wildcards: nico=epcr1
Waiting at most 5 seconds for missing files.
TrimmomaticPE: Started with arguments:
-threads 16 01_raw/epcr1_1.fastq 01_raw/epcr1_2.fastq 02_clean/epcr1_1.paired.fq.gz 02_clean/epcr1_1.unpaired.fq.gz 02_clean/epcr1_2.paired.fq.gz 02_clean/epcr1_2.unpaired.fq.gz ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
Using Long Clipping Sequence: 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT'
ILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Quality encoding detected as phred33
[Tue Apr 30 03:11:59 2019]
Finished job 3.
1 of 21 steps (5%) done
[Tue Apr 30 03:11:59 2019]
rule trim:
input: 01_raw/wt1_1.fastq, 01_raw/wt1_2.fastq
output: 02_clean/wt1_1.paired.fq.gz, 02_clean/wt1_1.unpaired.fq.gz, 02_clean/wt1_2.paired.fq.gz, 02_clean/wt1_2.unpaired.fq.gz
jobid: 1
wildcards: nico=wt1
Waiting at most 5 seconds for missing files.
TrimmomaticPE: Started with arguments:
-threads 16 01_raw/wt1_1.fastq 01_raw/wt1_2.fastq 02_clean/wt1_1.paired.fq.gz 02_clean/wt1_1.unpaired.fq.gz 02_clean/wt1_2.paired.fq.gz 02_clean/wt1_2.unpaired.fq.gz ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
Using Long Clipping Sequence: 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT'
ILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Quality encoding detected as phred33
[Tue Apr 30 03:12:00 2019]
Finished job 1.
2 of 21 steps (10%) done
[Tue Apr 30 03:12:00 2019]
rule trim:
input: 01_raw/wt2_1.fastq, 01_raw/wt2_2.fastq
output: 02_clean/wt2_1.paired.fq.gz, 02_clean/wt2_1.unpaired.fq.gz, 02_clean/wt2_2.paired.fq.gz, 02_clean/wt2_2.unpaired.fq.gz
jobid: 2
wildcards: nico=wt2
Waiting at most 5 seconds for missing files.
TrimmomaticPE: Started with arguments:
-threads 16 01_raw/wt2_1.fastq 01_raw/wt2_2.fastq 02_clean/wt2_1.paired.fq.gz 02_clean/wt2_1.unpaired.fq.gz 02_clean/wt2_2.paired.fq.gz 02_clean/wt2_2.unpaired.fq.gz ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
Using Long Clipping Sequence: 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT'
ILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Quality encoding detected as phred33
[Tue Apr 30 03:12:03 2019]
Finished job 2.
3 of 21 steps (14%) done
[Tue Apr 30 03:12:03 2019]
rule trim:
input: 01_raw/epcr2_1.fastq, 01_raw/epcr2_2.fastq
output: 02_clean/epcr2_1.paired.fq.gz, 02_clean/epcr2_1.unpaired.fq.gz, 02_clean/epcr2_2.paired.fq.gz, 02_clean/epcr2_2.unpaired.fq.gz
jobid: 4
wildcards: nico=epcr2
Waiting at most 5 seconds for missing files.
TrimmomaticPE: Started with arguments:
-threads 16 01_raw/epcr2_1.fastq 01_raw/epcr2_2.fastq 02_clean/epcr2_1.paired.fq.gz 02_clean/epcr2_1.unpaired.fq.gz 02_clean/epcr2_2.paired.fq.gz 02_clean/epcr2_2.unpaired.fq.gz ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
Using Long Clipping Sequence: 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT'
ILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Quality encoding detected as phred33
[Tue Apr 30 03:12:04 2019]
Finished job 4.
4 of 21 steps (19%) done
[Tue Apr 30 03:12:04 2019]
rule map:
input: 02_clean/wt2_1.paired.fq.gz, 02_clean/wt2_2.paired.fq.gz
output: 03_align/wt2.sam
log: logs/map/wt2.log
jobid: 6
wildcards: nico=wt2
Waiting at most 5 seconds for missing files.
[Tue Apr 30 03:12:06 2019]
Finished job 6.
5 of 21 steps (24%) done
[Tue Apr 30 03:12:06 2019]
rule map:
input: 02_clean/epcr1_1.paired.fq.gz, 02_clean/epcr1_2.paired.fq.gz
output: 03_align/epcr1.sam
log: logs/map/epcr1.log
jobid: 7
wildcards: nico=epcr1
Waiting at most 5 seconds for missing files.
[Tue Apr 30 03:12:07 2019]
Finished job 7.
6 of 21 steps (29%) done
[Tue Apr 30 03:12:07 2019]
rule map:
input: 02_clean/wt1_1.paired.fq.gz, 02_clean/wt1_2.paired.fq.gz
output: 03_align/wt1.sam
log: logs/map/wt1.log
jobid: 5
wildcards: nico=wt1
Waiting at most 5 seconds for missing files.
[Tue Apr 30 03:12:09 2019]
Finished job 5.
7 of 21 steps (33%) done
[Tue Apr 30 03:12:09 2019]
rule sort2bam:
input: 03_align/wt1.sam
output: 03_align/wt1.bam
jobid: 17
wildcards: nico=wt1
[W::sam_read1] Parse error at line 64601
samtools sort: truncated file. Aborting
[Tue Apr 30 03:12:09 2019]
Error in rule sort2bam:
jobid: 17
output: 03_align/wt1.bam
RuleException:
CalledProcessError in line 45 of /root/s/r/snakemake/my_rnaseq_data/Snakefile:
Command 'set -euo pipefail; samtools sort -@ 20 -m 10G -o 03_align/wt1.bam 03_align/wt1.sam' returned non-zero exit status 1.
File "/root/s/r/snakemake/my_rnaseq_data/Snakefile", line 45, in __rule_sort2bam
File "/root/miniconda2/envs/py3/lib/python3.6/concurrent/futures/thread.py", line 55, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /root/s/r/snakemake/my_rnaseq_data/.snakemake/log/2019-04-30T031153.994462.snakemake.log
После этого я прекращаю все запущенные задачи и проверяю папку, обнаружив, что она недавно сгенерировала несколько файлов, показывая следующее:
|-- 02_clean
| |-- epcr1_1.paired.fq.gz
| |-- epcr1_1.unpaired.fq.gz
| |-- epcr1_2.paired.fq.gz
| |-- epcr1_2.unpaired.fq.gz
| |-- epcr2_1.paired.fq.gz
| |-- epcr2_1.unpaired.fq.gz
| |-- epcr2_2.paired.fq.gz
| |-- epcr2_2.unpaired.fq.gz
| |-- wt1_1.paired.fq.gz
| |-- wt1_1.unpaired.fq.gz
| |-- wt1_2.paired.fq.gz
| |-- wt1_2.unpaired.fq.gz
| |-- wt2_1.paired.fq.gz
| |-- wt2_1.unpaired.fq.gz
| |-- wt2_2.paired.fq.gz
| `-- wt2_2.unpaired.fq.gz
|-- 03_align
| |-- epcr1.sam
| |-- wt1.sam
| `-- wt2.sam
но эти файлы неполные.
Так что меня смущает порядок выполнения. Работает ли параллельно пять правил для каждого образца? или просто запускать все примеры по правилу. Процесс запуска моего конвейера, кажется, поддерживает первый. И это также объясняет ошибку: «сортировка samtools: усеченный файл. Прерывание» на этапе sam2bam. Я не знаю, правильно ли мое предположение.
Но я добавил правило в свой Snakefile:
ruleorder: trim > map > sort2bam > fpkm > count
Но, похоже, это не работает! Есть ли другие опции или настройки, которые могут управлять порядком выполнения правил?
И вчера вечером я запустил запуск файла змеи с «карты правил», основанной на урезанном fastq.gz, который был урезан тем же самым файлом змеи. И это работает хорошо! весь запущенный процесс показывает следующее:
lding DAG of jobs...
viUsing shell: /bin/bash
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 all
4 count
4 fpkm
4 map
4 sort2bam
17
[Mon Apr 29 14:37:39 2019]
rule map:
input: 02_clean/epcr1_1.paired.fq.gz, 02_clean/epcr1_2.paired.fq.gz
output: 03_align/epcr1.sam
log: logs/map/epcr1.log
jobid: 19
wildcards: nico=epcr1
Waiting at most 5 seconds for missing files.
[Mon Apr 29 14:37:40 2019]
Finished job 19.
1 of 17 steps (6%) done
[Mon Apr 29 14:37:40 2019]
rule map:
input: 02_clean/wt1_1.paired.fq.gz, 02_clean/wt1_2.paired.fq.gz
output: 03_align/wt1.sam
log: logs/map/wt1.log
jobid: 17
wildcards: nico=wt1
Waiting at most 5 seconds for missing files.
[Mon Apr 29 14:37:48 2019]
Finished job 17.
2 of 17 steps (12%) done
[Mon Apr 29 14:37:51 2019]
rule map:
input: 02_clean/wt2_1.paired.fq.gz, 02_clean/wt2_2.paired.fq.gz
output: 03_align/wt2.sam
log: logs/map/wt2.log
jobid: 18
wildcards: nico=wt2
[Mon Apr 29 14:37:55 2019]
Finished job 18.
3 of 17 steps (18%) done
[Mon Apr 29 14:37:57 2019]
rule map:
input: 02_clean/epcr2_1.paired.fq.gz, 02_clean/epcr2_2.paired.fq.gz
output: 03_align/epcr2.sam
log: logs/map/epcr2.log
jobid: 20
wildcards: nico=epcr2
[Mon Apr 29 14:38:02 2019]
Finished job 20.
4 of 17 steps (24%) done
[Mon Apr 29 14:38:04 2019]
rule sort2bam:
input: 03_align/wt1.sam
output: 03_align/wt1.bam
jobid: 13
wildcards: nico=wt1
[bam_sort_core] merging from 0 files and 20 in-memory blocks...
[Mon Apr 29 14:39:45 2019]
Finished job 13.
5 of 17 steps (29%) done
[Mon Apr 29 14:39:46 2019]
rule fpkm:
input: 03_align/wt1.bam
output: 05_ft/wt1_gene.gtf, 05_ft/wt1_transcript.gtf
jobid: 9
wildcards: nico=wt1
[Mon Apr 29 14:40:42 2019]
Finished job 9.
6 of 17 steps (35%) done
[Mon Apr 29 14:40:42 2019]
rule count:
input: 03_align/wt1.bam
output: 04_exp/wt1_count.txt
jobid: 5
wildcards: nico=wt1
[Mon Apr 29 14:41:40 2019]
Finished job 5.
7 of 17 steps (41%) done
[Mon Apr 29 14:41:40 2019]
rule sort2bam:
input: 03_align/epcr2.sam
output: 03_align/epcr2.bam
jobid: 16
wildcards: nico=epcr2
[Mon Apr 29 14:56:41 2019]
Finished job 16.
8 of 17 steps (47%) done
[Mon Apr 29 14:56:41 2019]
rule count:
input: 03_align/epcr2.bam
output: 04_exp/epcr2_count.txt
jobid: 8
wildcards: nico=epcr2
[Mon Apr 29 14:57:45 2019]
Finished job 8.
9 of 17 steps (53%) done
[Mon Apr 29 14:57:45 2019]
rule fpkm:
input: 03_align/epcr2.bam
output: 05_ft/epcr2_gene.gtf, 05_ft/epcr2_transcript.gtf
jobid: 12
wildcards: nico=epcr2
h[Mon Apr 29 15:01:32 2019]
Finished job 12.
10 of 17 steps (59%) done
[Mon Apr 29 15:01:32 2019]
rule sort2bam:
input: 03_align/wt2.sam
output: 03_align/wt2.bam
jobid: 14
wildcards: nico=wt2
[bam_sort_core] merging from 0 files and 20 in-memory blocks...
[Mon Apr 29 15:08:05 2019]
Finished job 14.
11 of 17 steps (65%) done
[Mon Apr 29 15:08:05 2019]
rule fpkm:
input: 03_align/wt2.bam
output: 05_ft/wt2_gene.gtf, 05_ft/wt2_transcript.gtf
jobid: 10
wildcards: nico=wt2
[Mon Apr 29 15:12:28 2019]
Finished job 10.
12 of 17 steps (71%) done
[Mon Apr 29 15:12:28 2019]
rule count:
input: 03_align/wt2.bam
output: 04_exp/wt2_count.txt
jobid: 6
wildcards: nico=wt2
[Mon Apr 29 15:13:18 2019]
Finished job 6.
13 of 17 steps (76%) done
[Mon Apr 29 15:13:18 2019]
rule sort2bam:
input: 03_align/epcr1.sam
output: 03_align/epcr1.bam
jobid: 15
wildcards: nico=epcr1
[bam_sort_core] merging from 0 files and 20 in-memory blocks...
[Mon Apr 29 15:21:35 2019]
Finished job 15.
14 of 17 steps (82%) done
[Mon Apr 29 15:21:35 2019]
rule fpkm:
input: 03_align/epcr1.bam
output: 05_ft/epcr1_gene.gtf, 05_ft/epcr1_transcript.gtf
jobid: 11
wildcards: nico=epcr1
[Mon Apr 29 15:27:22 2019]
Finished job 11.
15 of 17 steps (88%) done
[Mon Apr 29 15:27:22 2019]
rule count:
input: 03_align/epcr1.bam
output: 04_exp/epcr1_count.txt
jobid: 7
wildcards: nico=epcr1
[Mon Apr 29 15:28:32 2019]
Finished job 7.
16 of 17 steps (94%) done
[Mon Apr 29 15:28:32 2019]
localrule all:
input: 02_clean/wt1_1.paired.fq.gz, 02_clean/wt2_1.paired.fq.gz, 02_clean/epcr1_1.paired.fq.gz, 02_clean/epcr2_1.paired.fq.gz, 02_clean/wt1_2.paired.fq.gz, 02_clean/wt2_2.paired.fq.gz, 02_clean/epcr1_2.paired.fq.gz, 02_clean/epcr2_2.paired.fq.gz, 04_exp/wt1_count.txt, 04_exp/wt2_count.txt, 04_exp/epcr1_count.txt, 04_exp/epcr2_count.txt, 05_ft/wt1_gene.gtf, 05_ft/wt2_gene.gtf, 05_ft/epcr1_gene.gtf, 05_ft/epcr2_gene.gtf, 05_ft/wt1_transcript.gtf, 05_ft/wt2_transcript.gtf, 05_ft/epcr1_transcript.gtf, 05_ft/epcr2_transcript.gtf
jobid: 0
[Mon Apr 29 15:28:32 2019]
Finished job 0.
17 of 17 steps (100%) done
И порядок выполнения правила такой:
Во-первых, карта правил выполняется полностью. Затем выполните левое правило для каждого sample.bam.
Почему порядок отличается от начала всего конвейера от raw_data?
Резюме:
два вопроса:
1. Является ли мой файл Snakefile или порядок его выполнения ошибкой?
2. Как отредактировать мой Snakefile, чтобы установить порядок каждого правила для выполнения правила задачи по правилу?
Я буду признателен, если кто-нибудь поможет!