nextflow-groovy:如何从工具生成嵌套文件夹中发出文件

sz81bmfz  于 8个月前  发布在  其他
关注(0)|答案(1)|浏览(73)

我正在使用曼塔工具执行变体调用步骤,该工具是使用strelka2工具的先驱。我指定runDir。曼塔有两个步骤:1)设置配置文件2)运行步骤1的输出runworkflow.py
如果运行目录中存在runWorkflow.py文件,则曼塔将不会运行。所以我把bash代码检查和删除。
在指定的运行目录中,它创建结果文件夹,曼塔在其中生成三个文件夹:

${run_dir}/results/evidence  ${run_dir}/results/stats  ${run_dir}/results/variants

在variants文件夹中有六个文件:

${run_dir}/results/variants/candidateSmallIndels.vcf.gz      ${run_dir}/results/variants/candidateSV.vcf.gz      ${run_dir}/results/variants/diploidSV.vcf.gz      results/variants/somaticSV.vcf.gz
${run_dir}/results/variants/candidateSmallIndels.vcf.gz.tbi  ${run_dir}/results/variants/candidateSV.vcf.gz.tbi  ${run_dir}/results/variants/diploidSV.vcf.gz.tbi  ${run_dir}/results/variants/somaticSV.vcf.gz.tbi

类似地,其他两个文件夹生成它们各自的文件。我想从${run_dir}/results/variants文件夹发出的输出,因为它是一个输入到strelka2工具。
请看下面的代码

process manta {
errorStrategy 'retry'
    maxRetries 3

 publishDir path: "${params.outdir}/${sample_name}/manta/", mode: 'copy'
        input:
        tuple val(sample_id_tumor),path(bqsrbam_tumor, stageAs: 'manta_tumorbqsrbam/*')
        tuple val(sample_id_tumor),path(bqsrbam_bai_tumor, stageAs: 'manta_tumorbqsrbam/*')
        tuple val(sample_id_normal),path(bqsrbam_normal, stageAs: 'manta_normalbqsrbam/*')
        tuple val(sample_id_normal),path(bqsrbam_bai_normal, stageAs: 'manta_normalbqsrbam/*')

        output:
 tuple val(sample_name), path("candidateSmallIndels.vcf.gz"), emit: manta_small_indels_vcf
 tuple val(sample_name), path("candidateSmallIndels.vcf.gz.tbi"), emit: manta_small_indels_vcf_tbi
 tuple val(sample_name), path("candidateSV.vcf.gz"), emit: manta_candidateSV_vcf
 tuple val(sample_name), path("candidateSV.vcf.gz.tbi"), emit: manta_candidateSV_vcf_tbi
 tuple val(sample_name), path("diploidSV.vcf.gz"), emit: manta_diploidSV_vcf
 tuple val(sample_name), path("diploidSV.vcf.gz.tbi"), emit: manta_diploidSV_vcf_tbi
 tuple val(sample_name), path("somaticSV.vcf.gz"), emit: manta_somaticSV_vcf
 tuple val(sample_name), path("somaticSV.vcf.gz.tbi"), emit: manta_somaticSV_vcf_tbi

        script:
sample_name=sample_id_normal.split('_N')[0]

        """

mkdir -p ${params.outdir}/${sample_name}/manta/

if [ -f "${params.outdir}/${sample_name}/manta/runWorkflow.py" ] ; then
    rm "${params.outdir}/${sample_name}/manta/runWorkflow.py"
fi

/hpc/packages/configManta.py \
--normalBam ${bqsrbam_normal} \
--tumorBam ${bqsrbam_tumor} \
--referenceFasta $params.hg38genome \
--runDir ${params.outdir}/${sample_name}/manta/
        
${params.outdir}/${sample_name}/manta/./runWorkflow.py -m local -j 12 --quiet

        """
}

我收到错误:Missing output file(s) candidateSmallIndels.vcf.gz expected by process manta (1)
然后将变量文件夹的输出提供到遵循相同步骤的strelka2脚本中:1)生成运行工作流脚本2)运行工作流脚本
来自Strelka2输出也遵循相同的结构
如果需要其他细节,请告诉我。

v7pvogib

v7pvogib1#

不应在管道脚本中使用绝对路径。这破坏了可移植性。相反,configManta.py应该位于管道文件夹的bin文件夹中。此外,输出的最终相对路径应该提供给Nextflow。不仅仅

output:
  tuple path("candidateSmallIndels.vcf.gz") ...

output:
  tuple path("results/variants/candidateSmallIndels.vcf.gz") ...

**更新:**下面的片段显示了Nextflow如何解析路径,即使您在任务文件夹中创建文件夹以组织输出。

process FOO {
  input:
    tuple val(filename), val(content)

  output:
    path "app_results/${filename}.txt"

  script:
    """
    mkdir -p app_results
    echo ${content} > app_results/${filename}.txt
    """
}

process BAR {
  debug true

  input:
    path content_file

  output:
    stdout

  script:
    """
    cat ${content_file}
    """
}

workflow {
  Channel
    .of(["m1.txt", "File of Marcel"],
        ["p1.txt", "File of Peter"])
    | FOO
    | BAR
}

注意在BAR的过程中,我只需要说path content_file。这样的Nextflow管道的输出是:

N E X T F L O W  ~  version 23.08.0-edge
Launching `metal.nf` [elated_aryabhata] DSL2 - revision: 46da670013
executor >  local (4)
[88/1ea6e3] process > FOO (2) [100%] 2 of 2 ✔
[09/fbf350] process > BAR (1) [100%] 2 of 2 ✔
File of Peter

File of Marcel

相关问题