Чтобы было понятно, я должен изменить пост. Ситуация такова, что в начале я хорошо запускал конвейер на локальном компьютере, но не получался при отправке в кластер. После публикации вопроса я обнаружил, что версия snakemake была 3.13.3, поэтому я обновил ее до v5.7.3, а затем обнаружил, что она не работает как на локальном компьютере, так и на кластере. Таким образом, я сейчас пытаюсь понять, что не так с моим Snakefile
или чем-то еще. Сообщение об ошибке:
Waiting at most 5 seconds for missing files.
MissingOutputException in line 24 of /work/path/rna_seq_pipeline/Snakefile:
Missing files after 5 seconds:
bam/A2_Aligned.toTranscriptome.out.bam
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /work/path/rna_seq_pipeline/.snakemake/log/2019-11-07T153434.327966.snakemake.log
Так что, возможно, что-то не так с моим файлом snakamake. Вот мой Snakefile
:
# config file
configfile: "config.yaml"
shell.prefix("source ~/.bash_profile")
# determine which genome reference you would like to use
# here we are using GRCm38
# depending on the freeze, the appropriate references and data files will be chosen from the config
freeze = config['freeze']
# read list of samples, one per line
with open(config['samples']) as f:
SAMPLES = f.read().splitlines()
rule all:
input:
starindex = config['reference']['stargenomedir'][freeze] + "/" + "SAindex",
rsemindex = config['reference']['rsemgenomedir'][freeze] + ".n2g.idx.fa",
fastqs = expand("fastq/{file}_{rep}_paired.fq.gz", file = SAMPLES, rep = ['1','2']),
bams = expand("bam/{file}_Aligned.toTranscriptome.out.bam", file = SAMPLES),
quant = expand("quant/{file}.genes.results", file = SAMPLES)
# align using STAR
rule star_align:
input:
f1 = "fastq/" + "{file}_1_paired.fq.gz",
f2 = "fastq/" + "{file}_2_paired.fq.gz"
output:
out = "bam/" + "{file}_Aligned.toTranscriptome.out.bam"
params:
star = config['tools']['star'],
genomedir = config['reference']['stargenomedir'][freeze],
prefix = "bam/" + "{file}_"
threads: 12
shell:
"""
{params.star} \
--runThreadN {threads} \
--genomeDir {params.genomedir} \
--readFilesIn {input.f1} {input.f2} \
--readFilesCommand zcat \
--outFileNamePrefix {params.prefix} \
--outSAMtype BAM SortedByCoordinate \
--outSAMunmapped Within \
--quantMode TranscriptomeSAM \
--outSAMattributes NH HI AS NM MD \
--outFilterType BySJout \
--outFilterMultimapNmax 20 \
--outFilterMismatchNmax 999 \
--outFilterMismatchNoverReadLmax 0.04 \
--alignIntronMin 20 \
--alignIntronMax 1000000 \
--alignMatesGapMax 1000000 \
--alignSJoverhangMin 8 \
--alignSJDBoverhangMin 1 \
--sjdbScore 1 \
--limitBAMsortRAM 50000000000
"""
# quantify expression using RSEM
rule rsem_quant:
input:
bam = "bam/" + "{file}_Aligned.toTranscriptome.out.bam"
output:
quant = "quant/" + "{file}.genes.results"
params:
calcexp = config['tools']['rsem']['calcexp'],
genomedir = config['reference']['rsemgenomedir'][freeze],
prefix = "quant/" + "{file}"
threads: 12
shell:
"""
{params.calcexp} \
--paired-end \
--no-bam-output \
--quiet \
--no-qualities \
-p {threads} \
--forward-prob 0.5 \
--seed-length 21 \
--fragment-length-mean -1.0 \
--bam {input.bam} {params.genomedir} {params.prefix}
И мой config.yaml
:
freeze: grcm38
# samples file
samples:
samples.txt
# software, binaries or tools
tools:
fastqdump: fastq-dump
star: STAR
rsem:
calcexp: rsem-calculate-expression
prepref: rsem-prepare-reference
# reference files, genome indices and data
reference:
stargenomedir:
grch38: /work/path/reference/STAR/GRCh38
grcm38: /work/path/reference/STAR/GRCm38
rsemgenomedir:
grch38: /work/path/reference/RSEM/GRCh38/GRCh38
grcm38: /work/path/reference/RSEM/GRCm38/GRCm38
fasta:
grch38: /work/path/GRCh38/Homo_sapiens.GRCh38.dna.primary_assembly.fa
grcm38: /work/path/reference/GRCm38/Mus_musculus.GRCm38.dna.primary_assembly.fa
gtf:
grch38: /work/path/reference/GRCh38/Homo_sapiens.GRCh38.98.gtf
grcm38: /work/path/reference/GRCm38/Mus_musculus.GRCm38.98.gtf
И, наконец, samples.txt
:
A1
A2
Есть какие-нибудь предложения?
пс: адаптировано с конвейера https://github.com/komalsrathi/rnaseq-star-rsem-pipeline/blob/master/Snakefile