Snakemake работает шаг за шагом или параллельно? - PullRequest
1 голос
/ 29 апреля 2019

Я пишу конвейер RNA-seq, используя Snakemake в течение недели. Я до сих пор не знаю рабочий порядок. Версия snakemake - 5.4.4

.

Мой конвейер RNA-seq состоит из пяти частей, поэтому я пишу пять правил (усечение правила, правило alignemnt, правило sort_to_bam, правило fpkm, количество правил). Когда я пишу правило, я проверю его, выполнив его。 И наконец я закончил. И он работает нормально, когда я проверяю каждое правило шаг за шагом. Вот мой Snakefile:

SBT=["wt1","wt2","epcr1","epcr2"]

ruleorder: trim > map > sort2bam > fpkm > count
rule all:
    input:
        expand("02_clean/{nico}_1.paired.fq.gz", nico=SBT),
        expand("02_clean/{nico}_2.paired.fq.gz", nico=SBT),
        expand("03_align/{nico}.sam", nico=SBT),
        expand("04_exp/{nico}_count.txt", nico=SBT),
        expand("05_ft/{nico}_gene.gtf", nico=SBT),
        expand("05_ft/{nico}_transcript.gtf", nico=SBT)

rule trim:
    input:
        "01_raw/{nico}_1.fastq",
        "01_raw/{nico}_2.fastq"
    output:
        "02_clean/{nico}_1.paired.fq.gz",
        "02_clean/{nico}_1.unpaired.fq.gz",
        "02_clean/{nico}_2.paired.fq.gz",
        "02_clean/{nico}_2.unpaired.fq.gz",
    threads: 20
    shell:
        "java -jar /software/Trimmomatic-0.36/trimmomatic-0.36.jar PE -threads 16 {input[0]} {input[1]} {output[0]} {output[1]} {output[2]} {output[3]} ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 &"

rule map:
    input:
        "02_clean/{nico}_1.paired.fq.gz",
        "02_clean/{nico}_2.paired.fq.gz"
    output:
        "03_align/{nico}.sam"
    log:
        "logs/map/{nico}.log"
    threads: 40
    shell:
        "hisat2 -p 20 --dta -x /root/s/r/p/A_th/WT-Al_VS_WT-CK/index/tair10 -1 {input[0]} -2 {input[1]} -S {output} >{log} 2>&1 &"

rule sort2bam:
    input:
        "03_align/{nico}.sam"
    output:
        "03_align/{nico}.bam"
    threads:20
    shell:
        "samtools sort -@ 20 -m 10G -o {output} {input}"

rule count:
    input:
        "03_align/{nico}.bam"
    output:
        "04_exp/{nico}_count.txt"
    shell:
        "featureCounts -T 20 -p -t exon -g gene_id -a /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -o {output} {input}"


rule fpkm:
    input:
        "03_align/{nico}.bam"
    output:
        "05_ft/{nico}_gene.gtf",
        "05_ft/{nico}_transcript.gtf"
    threads: 40
    shell:
        "stringtie -e -p 30 -G /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -A {output[0]} -o {output[1]} {input}"

raw_data отображается следующим образом:

(py3) root@SBT:~/s/r/snakemake/my_rnaseq_data/01_raw# tree
.
|-- epcr1_1.fastq
|-- epcr1_2.fastq
|-- epcr2_1.fastq
|-- epcr2_2.fastq
|-- wt1_1.fastq
|-- wt1_2.fastq
|-- wt2_1.fastq
`-- wt2_2.fastq

Затем я хочу протестировать конвейер из raw_data, шаг за шагом удаляя все существующие промежуточные файлы, которые я тестировал ранее. Вот мои результаты dry_run:

Building DAG of jobs...
Job counts:
        count   jobs
        1       all
        4       count
        4       fpkm
        4       map
        4       sort2bam
        4       trim
        21

[Tue Apr 30 03:09:28 2019]
rule trim:
    input: 01_raw/epcr1_1.fastq, 01_raw/epcr1_2.fastq
    output: 02_clean/epcr1_1.paired.fq.gz, 02_clean/epcr1_1.unpaired.fq.gz, 02_clean/epcr1_2.paired.fq.gz, 02_clean/epcr1_2.unpaired.fq.gz
    jobid: 3
    wildcards: nico=epcr1

java -jar /software/Trimmomatic-0.36/trimmomatic-0.36.jar PE -threads 16 01_raw/epcr1_1.fastq 01_raw/epcr1_2.fastq 02_clean/epcr1_1.paired.fq.gz 02_clean/epcr1_1.unpaired.fq.gz 02_clean/epcr1_2.paired.fq.gz 02_clean/epcr1_2.unpaired.fq.gz ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 &

[Tue Apr 30 03:09:28 2019]
rule trim:
    input: 01_raw/epcr2_1.fastq, 01_raw/epcr2_2.fastq
    output: 02_clean/epcr2_1.paired.fq.gz, 02_clean/epcr2_1.unpaired.fq.gz, 02_clean/epcr2_2.paired.fq.gz, 02_clean/epcr2_2.unpaired.fq.gz
    jobid: 4
    wildcards: nico=epcr2

java -jar /software/Trimmomatic-0.36/trimmomatic-0.36.jar PE -threads 16 01_raw/epcr2_1.fastq 01_raw/epcr2_2.fastq 02_clean/epcr2_1.paired.fq.gz 02_clean/epcr2_1.unpaired.fq.gz 02_clean/epcr2_2.paired.fq.gz 02_clean/epcr2_2.unpaired.fq.gz ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 &

[Tue Apr 30 03:09:28 2019]
rule trim:
    input: 01_raw/wt1_1.fastq, 01_raw/wt1_2.fastq
    output: 02_clean/wt1_1.paired.fq.gz, 02_clean/wt1_1.unpaired.fq.gz, 02_clean/wt1_2.paired.fq.gz, 02_clean/wt1_2.unpaired.fq.gz
    jobid: 1
    wildcards: nico=wt1

java -jar /software/Trimmomatic-0.36/trimmomatic-0.36.jar PE -threads 16 01_raw/wt1_1.fastq 01_raw/wt1_2.fastq 02_clean/wt1_1.paired.fq.gz 02_clean/wt1_1.unpaired.fq.gz 02_clean/wt1_2.paired.fq.gz 02_clean/wt1_2.unpaired.fq.gz ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 &

[Tue Apr 30 03:09:28 2019]
rule trim:
    input: 01_raw/wt2_1.fastq, 01_raw/wt2_2.fastq
    output: 02_clean/wt2_1.paired.fq.gz, 02_clean/wt2_1.unpaired.fq.gz, 02_clean/wt2_2.paired.fq.gz, 02_clean/wt2_2.unpaired.fq.gz
    jobid: 2
    wildcards: nico=wt2

java -jar /software/Trimmomatic-0.36/trimmomatic-0.36.jar PE -threads 16 01_raw/wt2_1.fastq 01_raw/wt2_2.fastq 02_clean/wt2_1.paired.fq.gz 02_clean/wt2_1.unpaired.fq.gz 02_clean/wt2_2.paired.fq.gz 02_clean/wt2_2.unpaired.fq.gz ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 &

[Tue Apr 30 03:09:28 2019]
rule map:
    input: 02_clean/wt1_1.paired.fq.gz, 02_clean/wt1_2.paired.fq.gz
    output: 03_align/wt1.sam
    log: logs/map/wt1.log
    jobid: 5
    wildcards: nico=wt1

hisat2 -p 20 --dta -x /root/s/r/p/A_th/WT-Al_VS_WT-CK/index/tair10 -1 02_clean/wt1_1.paired.fq.gz -2 02_clean/wt1_2.paired.fq.gz -S 03_align/wt1.sam >logs/map/wt1.log 2>&1 &

[Tue Apr 30 03:09:28 2019]
rule map:
    input: 02_clean/epcr2_1.paired.fq.gz, 02_clean/epcr2_2.paired.fq.gz
    output: 03_align/epcr2.sam
    log: logs/map/epcr2.log
    jobid: 8
    wildcards: nico=epcr2

hisat2 -p 20 --dta -x /root/s/r/p/A_th/WT-Al_VS_WT-CK/index/tair10 -1 02_clean/epcr2_1.paired.fq.gz -2 02_clean/epcr2_2.paired.fq.gz -S 03_align/epcr2.sam >logs/map/epcr2.log 2>&1 &

[Tue Apr 30 03:09:28 2019]
rule map:
    input: 02_clean/wt2_1.paired.fq.gz, 02_clean/wt2_2.paired.fq.gz
    output: 03_align/wt2.sam
    log: logs/map/wt2.log
    jobid: 6
    wildcards: nico=wt2

hisat2 -p 20 --dta -x /root/s/r/p/A_th/WT-Al_VS_WT-CK/index/tair10 -1 02_clean/wt2_1.paired.fq.gz -2 02_clean/wt2_2.paired.fq.gz -S 03_align/wt2.sam >logs/map/wt2.log 2>&1 &

[Tue Apr 30 03:09:28 2019]
rule map:
    input: 02_clean/epcr1_1.paired.fq.gz, 02_clean/epcr1_2.paired.fq.gz
    output: 03_align/epcr1.sam
    log: logs/map/epcr1.log
    jobid: 7
    wildcards: nico=epcr1

hisat2 -p 20 --dta -x /root/s/r/p/A_th/WT-Al_VS_WT-CK/index/tair10 -1 02_clean/epcr1_1.paired.fq.gz -2 02_clean/epcr1_2.paired.fq.gz -S 03_align/epcr1.sam >logs/map/epcr1.log 2>&1 &

[Tue Apr 30 03:09:28 2019]
rule sort2bam:
    input: 03_align/epcr1.sam
    output: 03_align/epcr1.bam
    jobid: 19
    wildcards: nico=epcr1

samtools sort -@ 20 -m 10G -o 03_align/epcr1.bam 03_align/epcr1.sam

[Tue Apr 30 03:09:28 2019]
rule sort2bam:
    input: 03_align/epcr2.sam
    output: 03_align/epcr2.bam
    jobid: 20
    wildcards: nico=epcr2

samtools sort -@ 20 -m 10G -o 03_align/epcr2.bam 03_align/epcr2.sam

[Tue Apr 30 03:09:28 2019]
rule sort2bam:
    input: 03_align/wt1.sam
    output: 03_align/wt1.bam
    jobid: 17
    wildcards: nico=wt1

samtools sort -@ 20 -m 10G -o 03_align/wt1.bam 03_align/wt1.sam

[Tue Apr 30 03:09:28 2019]
rule sort2bam:
    input: 03_align/wt2.sam
    output: 03_align/wt2.bam
    jobid: 18
    wildcards: nico=wt2

samtools sort -@ 20 -m 10G -o 03_align/wt2.bam 03_align/wt2.sam

[Tue Apr 30 03:09:28 2019]
rule count:
    input: 03_align/wt2.bam
    output: 04_exp/wt2_count.txt
    jobid: 10
    wildcards: nico=wt2

featureCounts -T 20 -p -t exon -g gene_id -a /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -o 04_exp/wt2_count.txt 03_align/wt2.bam

[Tue Apr 30 03:09:28 2019]
rule count:
    input: 03_align/wt1.bam
    output: 04_exp/wt1_count.txt
    jobid: 9
    wildcards: nico=wt1

featureCounts -T 20 -p -t exon -g gene_id -a /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -o 04_exp/wt1_count.txt 03_align/wt1.bam

[Tue Apr 30 03:09:28 2019]
rule count:
    input: 03_align/epcr2.bam
    output: 04_exp/epcr2_count.txt
    jobid: 12
    wildcards: nico=epcr2

featureCounts -T 20 -p -t exon -g gene_id -a /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -o 04_exp/epcr2_count.txt 03_align/epcr2.bam

[Tue Apr 30 03:09:28 2019]
rule fpkm:
    input: 03_align/wt1.bam
    output: 05_ft/wt1_gene.gtf, 05_ft/wt1_transcript.gtf
    jobid: 13
    wildcards: nico=wt1

stringtie -e -p 30 -G /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -A 05_ft/wt1_gene.gtf -o 05_ft/wt1_transcript.gtf 03_align/wt1.bam

[Tue Apr 30 03:09:28 2019]
rule fpkm:
    input: 03_align/epcr2.bam
    output: 05_ft/epcr2_gene.gtf, 05_ft/epcr2_transcript.gtf
    jobid: 16
    wildcards: nico=epcr2

stringtie -e -p 30 -G /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -A 05_ft/epcr2_gene.gtf -o 05_ft/epcr2_transcript.gtf 03_align/epcr2.bam

[Tue Apr 30 03:09:28 2019]
rule fpkm:
    input: 03_align/wt2.bam
    output: 05_ft/wt2_gene.gtf, 05_ft/wt2_transcript.gtf
    jobid: 14
    wildcards: nico=wt2

stringtie -e -p 30 -G /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -A 05_ft/wt2_gene.gtf -o 05_ft/wt2_transcript.gtf 03_align/wt2.bam

[Tue Apr 30 03:09:28 2019]
rule count:
    input: 03_align/epcr1.bam
    output: 04_exp/epcr1_count.txt
    jobid: 11
    wildcards: nico=epcr1

featureCounts -T 20 -p -t exon -g gene_id -a /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -o 04_exp/epcr1_count.txt 03_align/epcr1.bam

[Tue Apr 30 03:09:28 2019]
rule fpkm:
    input: 03_align/epcr1.bam
    output: 05_ft/epcr1_gene.gtf, 05_ft/epcr1_transcript.gtf
    jobid: 15
    wildcards: nico=epcr1

stringtie -e -p 30 -G /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -A 05_ft/epcr1_gene.gtf -o 05_ft/epcr1_transcript.gtf 03_align/epcr1.bam

[Tue Apr 30 03:09:28 2019]
localrule all:
    input: 02_clean/wt1_1.paired.fq.gz, 02_clean/wt2_1.paired.fq.gz, 02_clean/epcr1_1.paired.fq.gz, 02_clean/epcr2_1.paired.fq.gz, 02_clean/wt1_2.paired.fq.gz, 02_clean/wt2_2.paired.fq.gz, 02_clean/epcr1_2.paired.fq.gz, 02_clean/epcr2_2.paired.fq.gz, 03_align/wt1.sam, 03_align/wt2.sam, 03_align/epcr1.sam, 03_align/epcr2.sam, 04_exp/wt1_count.txt, 04_exp/wt2_count.txt, 04_exp/epcr1_count.txt, 04_exp/epcr2_count.txt, 05_ft/wt1_gene.gtf, 05_ft/wt2_gene.gtf, 05_ft/epcr1_gene.gtf, 05_ft/epcr2_gene.gtf, 05_ft/wt1_transcript.gtf, 05_ft/wt2_transcript.gtf, 05_ft/epcr1_transcript.gtf, 05_ft/epcr2_transcript.gtf
    jobid: 0

Job counts:
        count   jobs
        1       all
        4       count
        4       fpkm
        4       map
        4       sort2bam
        4       trim
        21
This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.

Но когда я действительно выполняю его, он выдает сообщение об ошибке при выполнении правила sort2bam:

Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
        count   jobs
        1       all
        4       count
        4       fpkm
        4       map
        4       sort2bam
        4       trim
        21

[Tue Apr 30 03:11:57 2019]
rule trim:
    input: 01_raw/epcr1_1.fastq, 01_raw/epcr1_2.fastq
    output: 02_clean/epcr1_1.paired.fq.gz, 02_clean/epcr1_1.unpaired.fq.gz, 02_clean/epcr1_2.paired.fq.gz, 02_clean/epcr1_2.unpaired.fq.gz
    jobid: 3
    wildcards: nico=epcr1

Waiting at most 5 seconds for missing files.
TrimmomaticPE: Started with arguments:
 -threads 16 01_raw/epcr1_1.fastq 01_raw/epcr1_2.fastq 02_clean/epcr1_1.paired.fq.gz 02_clean/epcr1_1.unpaired.fq.gz 02_clean/epcr1_2.paired.fq.gz 02_clean/epcr1_2.unpaired.fq.gz ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
Using Long Clipping Sequence: 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT'
ILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Quality encoding detected as phred33
[Tue Apr 30 03:11:59 2019]
Finished job 3.
1 of 21 steps (5%) done

[Tue Apr 30 03:11:59 2019]
rule trim:
    input: 01_raw/wt1_1.fastq, 01_raw/wt1_2.fastq
    output: 02_clean/wt1_1.paired.fq.gz, 02_clean/wt1_1.unpaired.fq.gz, 02_clean/wt1_2.paired.fq.gz, 02_clean/wt1_2.unpaired.fq.gz
    jobid: 1
    wildcards: nico=wt1

Waiting at most 5 seconds for missing files.
TrimmomaticPE: Started with arguments:
 -threads 16 01_raw/wt1_1.fastq 01_raw/wt1_2.fastq 02_clean/wt1_1.paired.fq.gz 02_clean/wt1_1.unpaired.fq.gz 02_clean/wt1_2.paired.fq.gz 02_clean/wt1_2.unpaired.fq.gz ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
Using Long Clipping Sequence: 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT'
ILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Quality encoding detected as phred33
[Tue Apr 30 03:12:00 2019]
Finished job 1.
2 of 21 steps (10%) done

[Tue Apr 30 03:12:00 2019]
rule trim:
    input: 01_raw/wt2_1.fastq, 01_raw/wt2_2.fastq
    output: 02_clean/wt2_1.paired.fq.gz, 02_clean/wt2_1.unpaired.fq.gz, 02_clean/wt2_2.paired.fq.gz, 02_clean/wt2_2.unpaired.fq.gz
    jobid: 2
    wildcards: nico=wt2

Waiting at most 5 seconds for missing files.
TrimmomaticPE: Started with arguments:
 -threads 16 01_raw/wt2_1.fastq 01_raw/wt2_2.fastq 02_clean/wt2_1.paired.fq.gz 02_clean/wt2_1.unpaired.fq.gz 02_clean/wt2_2.paired.fq.gz 02_clean/wt2_2.unpaired.fq.gz ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
Using Long Clipping Sequence: 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT'
ILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Quality encoding detected as phred33
[Tue Apr 30 03:12:03 2019]
Finished job 2.
3 of 21 steps (14%) done

[Tue Apr 30 03:12:03 2019]
rule trim:
    input: 01_raw/epcr2_1.fastq, 01_raw/epcr2_2.fastq
    output: 02_clean/epcr2_1.paired.fq.gz, 02_clean/epcr2_1.unpaired.fq.gz, 02_clean/epcr2_2.paired.fq.gz, 02_clean/epcr2_2.unpaired.fq.gz
    jobid: 4
    wildcards: nico=epcr2

Waiting at most 5 seconds for missing files.
TrimmomaticPE: Started with arguments:
 -threads 16 01_raw/epcr2_1.fastq 01_raw/epcr2_2.fastq 02_clean/epcr2_1.paired.fq.gz 02_clean/epcr2_1.unpaired.fq.gz 02_clean/epcr2_2.paired.fq.gz 02_clean/epcr2_2.unpaired.fq.gz ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
Using Long Clipping Sequence: 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
Using Long Clipping Sequence: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT'
ILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Quality encoding detected as phred33
[Tue Apr 30 03:12:04 2019]
Finished job 4.
4 of 21 steps (19%) done

[Tue Apr 30 03:12:04 2019]
rule map:
    input: 02_clean/wt2_1.paired.fq.gz, 02_clean/wt2_2.paired.fq.gz
    output: 03_align/wt2.sam
    log: logs/map/wt2.log
    jobid: 6
    wildcards: nico=wt2

Waiting at most 5 seconds for missing files.
[Tue Apr 30 03:12:06 2019]
Finished job 6.
5 of 21 steps (24%) done

[Tue Apr 30 03:12:06 2019]
rule map:
    input: 02_clean/epcr1_1.paired.fq.gz, 02_clean/epcr1_2.paired.fq.gz
    output: 03_align/epcr1.sam
    log: logs/map/epcr1.log
    jobid: 7
    wildcards: nico=epcr1

Waiting at most 5 seconds for missing files.
[Tue Apr 30 03:12:07 2019]
Finished job 7.
6 of 21 steps (29%) done

[Tue Apr 30 03:12:07 2019]
rule map:
    input: 02_clean/wt1_1.paired.fq.gz, 02_clean/wt1_2.paired.fq.gz
    output: 03_align/wt1.sam
    log: logs/map/wt1.log
    jobid: 5
    wildcards: nico=wt1

Waiting at most 5 seconds for missing files.
[Tue Apr 30 03:12:09 2019]
Finished job 5.
7 of 21 steps (33%) done

[Tue Apr 30 03:12:09 2019]
rule sort2bam:
    input: 03_align/wt1.sam
    output: 03_align/wt1.bam
    jobid: 17
    wildcards: nico=wt1

[W::sam_read1] Parse error at line 64601
samtools sort: truncated file. Aborting
[Tue Apr 30 03:12:09 2019]
Error in rule sort2bam:
    jobid: 17
    output: 03_align/wt1.bam

RuleException:
CalledProcessError in line 45 of /root/s/r/snakemake/my_rnaseq_data/Snakefile:
Command 'set -euo pipefail;  samtools sort -@ 20 -m 10G -o 03_align/wt1.bam 03_align/wt1.sam' returned non-zero exit status 1.
  File "/root/s/r/snakemake/my_rnaseq_data/Snakefile", line 45, in __rule_sort2bam
  File "/root/miniconda2/envs/py3/lib/python3.6/concurrent/futures/thread.py", line 55, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /root/s/r/snakemake/my_rnaseq_data/.snakemake/log/2019-04-30T031153.994462.snakemake.log

После этого я прекращаю все запущенные задачи и проверяю папку, обнаружив, что она недавно сгенерировала несколько файлов, показывая следующее:

|-- 02_clean
|   |-- epcr1_1.paired.fq.gz
|   |-- epcr1_1.unpaired.fq.gz
|   |-- epcr1_2.paired.fq.gz
|   |-- epcr1_2.unpaired.fq.gz
|   |-- epcr2_1.paired.fq.gz
|   |-- epcr2_1.unpaired.fq.gz
|   |-- epcr2_2.paired.fq.gz
|   |-- epcr2_2.unpaired.fq.gz
|   |-- wt1_1.paired.fq.gz
|   |-- wt1_1.unpaired.fq.gz
|   |-- wt1_2.paired.fq.gz
|   |-- wt1_2.unpaired.fq.gz
|   |-- wt2_1.paired.fq.gz
|   |-- wt2_1.unpaired.fq.gz
|   |-- wt2_2.paired.fq.gz
|   `-- wt2_2.unpaired.fq.gz
|-- 03_align
|   |-- epcr1.sam
|   |-- wt1.sam
|   `-- wt2.sam

но эти файлы неполные. Так что меня смущает порядок выполнения. Работает ли параллельно пять правил для каждого образца? или просто запускать все примеры по правилу. Процесс запуска моего конвейера, кажется, поддерживает первый. И это также объясняет ошибку: «сортировка samtools: усеченный файл. Прерывание» на этапе sam2bam. Я не знаю, правильно ли мое предположение.

Но я добавил правило в свой Snakefile:

ruleorder: trim > map > sort2bam > fpkm > count

Но, похоже, это не работает! Есть ли другие опции или настройки, которые могут управлять порядком выполнения правил?

И вчера вечером я запустил запуск файла змеи с «карты правил», основанной на урезанном fastq.gz, который был урезан тем же самым файлом змеи. И это работает хорошо! весь запущенный процесс показывает следующее:

lding DAG of jobs...
viUsing shell: /bin/bash
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
        count   jobs
        1       all
        4       count
        4       fpkm
        4       map
        4       sort2bam
        17

[Mon Apr 29 14:37:39 2019]
rule map:
    input: 02_clean/epcr1_1.paired.fq.gz, 02_clean/epcr1_2.paired.fq.gz
    output: 03_align/epcr1.sam
    log: logs/map/epcr1.log
    jobid: 19
    wildcards: nico=epcr1

Waiting at most 5 seconds for missing files.
[Mon Apr 29 14:37:40 2019]
Finished job 19.
1 of 17 steps (6%) done

[Mon Apr 29 14:37:40 2019]
rule map:
    input: 02_clean/wt1_1.paired.fq.gz, 02_clean/wt1_2.paired.fq.gz
    output: 03_align/wt1.sam
    log: logs/map/wt1.log
    jobid: 17
    wildcards: nico=wt1

Waiting at most 5 seconds for missing files.
[Mon Apr 29 14:37:48 2019]
Finished job 17.
2 of 17 steps (12%) done

[Mon Apr 29 14:37:51 2019]
rule map:
    input: 02_clean/wt2_1.paired.fq.gz, 02_clean/wt2_2.paired.fq.gz
    output: 03_align/wt2.sam
    log: logs/map/wt2.log
    jobid: 18
    wildcards: nico=wt2

[Mon Apr 29 14:37:55 2019]
Finished job 18.
3 of 17 steps (18%) done

[Mon Apr 29 14:37:57 2019]
rule map:
    input: 02_clean/epcr2_1.paired.fq.gz, 02_clean/epcr2_2.paired.fq.gz
    output: 03_align/epcr2.sam
    log: logs/map/epcr2.log
    jobid: 20
    wildcards: nico=epcr2

[Mon Apr 29 14:38:02 2019]
Finished job 20.
4 of 17 steps (24%) done

[Mon Apr 29 14:38:04 2019]
rule sort2bam:
    input: 03_align/wt1.sam
    output: 03_align/wt1.bam
    jobid: 13
    wildcards: nico=wt1

[bam_sort_core] merging from 0 files and 20 in-memory blocks...
[Mon Apr 29 14:39:45 2019]
Finished job 13.
5 of 17 steps (29%) done

[Mon Apr 29 14:39:46 2019]
rule fpkm:
    input: 03_align/wt1.bam

output: 05_ft/wt1_gene.gtf, 05_ft/wt1_transcript.gtf
    jobid: 9
    wildcards: nico=wt1

[Mon Apr 29 14:40:42 2019]
Finished job 9.
6 of 17 steps (35%) done

[Mon Apr 29 14:40:42 2019]
rule count:
    input: 03_align/wt1.bam
    output: 04_exp/wt1_count.txt
    jobid: 5
    wildcards: nico=wt1
[Mon Apr 29 14:41:40 2019]
Finished job 5.
7 of 17 steps (41%) done

[Mon Apr 29 14:41:40 2019]
rule sort2bam:
    input: 03_align/epcr2.sam
    output: 03_align/epcr2.bam
    jobid: 16
    wildcards: nico=epcr2
[Mon Apr 29 14:56:41 2019]
Finished job 16.
8 of 17 steps (47%) done

[Mon Apr 29 14:56:41 2019]
rule count:
    input: 03_align/epcr2.bam
    output: 04_exp/epcr2_count.txt
    jobid: 8
    wildcards: nico=epcr2

[Mon Apr 29 14:57:45 2019]
Finished job 8.
9 of 17 steps (53%) done

[Mon Apr 29 14:57:45 2019]
rule fpkm:
    input: 03_align/epcr2.bam
    output: 05_ft/epcr2_gene.gtf, 05_ft/epcr2_transcript.gtf
    jobid: 12
    wildcards: nico=epcr2

h[Mon Apr 29 15:01:32 2019]
Finished job 12.
10 of 17 steps (59%) done

[Mon Apr 29 15:01:32 2019]
rule sort2bam:
    input: 03_align/wt2.sam
    output: 03_align/wt2.bam
    jobid: 14
    wildcards: nico=wt2

[bam_sort_core] merging from 0 files and 20 in-memory blocks...
[Mon Apr 29 15:08:05 2019]
Finished job 14.
11 of 17 steps (65%) done

[Mon Apr 29 15:08:05 2019]
rule fpkm:
    input: 03_align/wt2.bam
    output: 05_ft/wt2_gene.gtf, 05_ft/wt2_transcript.gtf
    jobid: 10
    wildcards: nico=wt2

[Mon Apr 29 15:12:28 2019]
Finished job 10.
12 of 17 steps (71%) done

[Mon Apr 29 15:12:28 2019]
rule count:
    input: 03_align/wt2.bam
    output: 04_exp/wt2_count.txt
    jobid: 6
    wildcards: nico=wt2
[Mon Apr 29 15:13:18 2019]
Finished job 6.
13 of 17 steps (76%) done

[Mon Apr 29 15:13:18 2019]
rule sort2bam:
    input: 03_align/epcr1.sam
    output: 03_align/epcr1.bam
    jobid: 15
    wildcards: nico=epcr1

[bam_sort_core] merging from 0 files and 20 in-memory blocks...
[Mon Apr 29 15:21:35 2019]
Finished job 15.
14 of 17 steps (82%) done

[Mon Apr 29 15:21:35 2019]
rule fpkm:
    input: 03_align/epcr1.bam
    output: 05_ft/epcr1_gene.gtf, 05_ft/epcr1_transcript.gtf
    jobid: 11
    wildcards: nico=epcr1

[Mon Apr 29 15:27:22 2019]
Finished job 11.
15 of 17 steps (88%) done

[Mon Apr 29 15:27:22 2019]
rule count:
    input: 03_align/epcr1.bam
    output: 04_exp/epcr1_count.txt
    jobid: 7
    wildcards: nico=epcr1

[Mon Apr 29 15:28:32 2019]
Finished job 7.
16 of 17 steps (94%) done

[Mon Apr 29 15:28:32 2019]
localrule all:
    input: 02_clean/wt1_1.paired.fq.gz, 02_clean/wt2_1.paired.fq.gz, 02_clean/epcr1_1.paired.fq.gz, 02_clean/epcr2_1.paired.fq.gz, 02_clean/wt1_2.paired.fq.gz, 02_clean/wt2_2.paired.fq.gz, 02_clean/epcr1_2.paired.fq.gz, 02_clean/epcr2_2.paired.fq.gz, 04_exp/wt1_count.txt, 04_exp/wt2_count.txt, 04_exp/epcr1_count.txt, 04_exp/epcr2_count.txt, 05_ft/wt1_gene.gtf, 05_ft/wt2_gene.gtf, 05_ft/epcr1_gene.gtf, 05_ft/epcr2_gene.gtf, 05_ft/wt1_transcript.gtf, 05_ft/wt2_transcript.gtf, 05_ft/epcr1_transcript.gtf, 05_ft/epcr2_transcript.gtf
    jobid: 0

[Mon Apr 29 15:28:32 2019]
Finished job 0.
17 of 17 steps (100%) done

И порядок выполнения правила такой: Во-первых, карта правил выполняется полностью. Затем выполните левое правило для каждого sample.bam.

Почему порядок отличается от начала всего конвейера от raw_data?

Резюме: два вопроса: 1. Является ли мой файл Snakefile или порядок его выполнения ошибкой? 2. Как отредактировать мой Snakefile, чтобы установить порядок каждого правила для выполнения правила задачи по правилу?

Я буду признателен, если кто-нибудь поможет!

1 Ответ

1 голос
/ 06 мая 2019

Проблема в том, что команды вашей оболочки в правилах trim и map оканчиваются на &, который отправляет рабочие процессы в фоновый режим, а основной процесс продолжается и завершается.Это сигнализирует Снейкмейку, что задание завершено.Затем Snakemake переходит к проверке выходных файлов, которые были объявлены результатом работы, но не может их найти (следовательно, ошибки, которые вы видите).

Похоже, что все будет работать, если вы удалите& с и держал процессы на переднем плане.

...