nextflow 从 bin 文件夹读取 bash 脚本而不是 Rscript

问题描述 投票:0回答:1

我正在开发一个基于 nextflow 的管道,我有两个用于下载文件的进程,如下所示,

process templateExample{
publishDir "data_analysis_files", mode:'copy'     

output:
path "*_gex.csv" , emit: count_files        

script:
'''
"download_files.sh"
'''   

}



process read_count_p{

publishDir "results",mode:'copy'
input:
path count_files


output:
path "result.txt"

"""
Rscript read_count.R ${count_files}
"""
 }


 workflow {
 
 templateExample()
 read_count_p(templateExample.out.count_files)
 
   }

脚本

download_files.sh
read_count.R
存在于 bin 文件夹中,但问题是,当我执行 nextflow 时,它会从 bin 文件夹中找到并执行名为
download_files.sh
的 bash 脚本,而不是名为 read_count.R 的 R 脚本。下面给出了 bash 脚本和 R 脚本。下面还给出了错误,

#!/bin/bash

# Define the URLs of the files to download
urls=(
    "https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3832nnn/GSM3832735/suppl/GSM3832735_wt_naive_gex.csv.gz"
    "https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3832nnn/GSM3832736/suppl/GSM3832736_wt_naive_adt.csv.gz"
    "https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3832nnn/GSM3832737/suppl/GSM3832737_wt_tumor_gex.csv.gz"
    "https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3832nnn/GSM3832738/suppl/GSM3832738_wt_tumor_adt.csv.gz" 
    "https://zenodo.org/records/5511975/files/negative_cDC1_relative_signatures.csv?download=1"
    "https://zenodo.org/records/5511975/files/positive_cDC1_relative_signatures.csv?download=1"
    "https://github.com/SIgN-Bioinformatics/sgCMAP_R_Scripts/blob/main/sgCMAP_R_Scripts/sgCMAP-internal.R"
    "https://github.com/SIgN-Bioinformatics/sgCMAP_R_Scripts/blob/main/sgCMAP_R_Scripts/sgCMAP_score.R"
    )


# Download each file using wget
for url in "${urls[@]}"; do
    wget "$url"
done

# Unzip each downloaded file using gunzip
for file in *.gz;do
    gunzip "$file"
done

R 脚本是

#!/user/bin/R
args <- commandArgs(trailingOnly = TRUE)
print(args[0])
my_vec <- c(args[0],args[1],args[0],class(args),args[2])
write.table(my_vec,"result1.txt")

错误如下,

acheema@acri-AS-1124US-TNRP:~$ nextflow run single_cell.nf
N E X T F L O W  ~  version 23.10.1
    Launching `single_cell.nf` [soggy_sanger] DSL2 - revision: f55ed68615
    executor >  local (2)
[8d/2e0586] process > templateExample [100%] 1 of 1 ✔
[6d/17dc6a] process > read_count_p    [100%] 1 of 1, failed: 1 ✘
    ERROR ~ Error executing process > 'read_count_p'

Caused by:
     Process `read_count_p` terminated with an error exit status (2)

Command executed:

     Rscript read_count.R GSM3832735_wt_naive_gex.csv GSM3832737_wt_tumor_gex.csv

Command exit status:
      2

Command output:
     Fatal error: cannot open file 'read_count.R': No such file or directory

Command error:
      Fatal error: cannot open file 'read_count.R': No such file or directory

Work dir:
      /home/acheema/work/6d/17dc6ad0908c96df730a0f7c28c428

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

-- Check '.nextflow.log' file for details

.nextflow.log
如下所示,

acheema@acri-AS-1124US-TNRP:~$ cat .nextflow.log
May-08 15:18:54.580 [main] DEBUG nextflow.cli.Launcher - $> nextflow run single_cell.nf
May-08 15:18:54.712 [main] INFO  nextflow.cli.CmdRun - N E X T F L O W  ~  version 23.10.1
May-08 15:18:54.734 [main] DEBUG nextflow.plugin.PluginsFacade - Setting up plugin manager > mode=prod; embedded=false; plugins-dir=/home/acheema/.nextflow/plugins; core-plugins: [email protected],[email protected],[email protected],[email protected],[email protected],[email protected],[email protected],[email protected],[email protected]
May-08 15:18:54.743 [main] INFO  o.pf4j.DefaultPluginStatusProvider - Enabled plugins: []
May-08 15:18:54.744 [main] INFO  o.pf4j.DefaultPluginStatusProvider - Disabled plugins: []
May-08 15:18:54.747 [main] INFO  org.pf4j.DefaultPluginManager - PF4J version 3.4.1 in 'deployment' mode
May-08 15:18:54.757 [main] INFO  org.pf4j.AbstractPluginManager - No plugins
May-08 15:18:54.817 [main] DEBUG nextflow.cli.CmdRun - Applied DSL=2 from script declararion
May-08 15:18:54.832 [main] INFO  nextflow.cli.CmdRun - Launching `single_cell.nf` [soggy_sanger] DSL2 - revision: f55ed68615
May-08 15:18:54.833 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins default=[]
May-08 15:18:54.833 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins resolved requirement=[]
May-08 15:18:54.840 [main] DEBUG n.secret.LocalSecretsProvider - Secrets store: /home/acheema/.nextflow/secrets/store.json
May-08 15:18:54.846 [main] DEBUG nextflow.secret.SecretsLoader - Discovered secrets providers: [nextflow.secret.LocalSecretsProvider@783ec989] - activable => nextflow.secret.LocalSecretsProvider@783ec989
May-08 15:18:54.899 [main] DEBUG nextflow.Session - Session UUID: 34564b50-df93-4baa-8861-cba8231186f4
May-08 15:18:54.900 [main] DEBUG nextflow.Session - Run name: soggy_sanger
May-08 15:18:54.901 [main] DEBUG nextflow.Session - Executor pool size: 128
May-08 15:18:54.908 [main] DEBUG nextflow.file.FilePorter - File porter settings maxRetries=3; maxTransfers=50; pollTimeout=null
May-08 15:18:54.911 [main] DEBUG nextflow.util.ThreadPoolBuilder - Creating thread pool 'FileTransfer' minSize=10; maxSize=384; workQueue=LinkedBlockingQueue[10000]; allowCoreThreadTimeout=false
May-08 15:18:54.938 [main] DEBUG nextflow.cli.CmdRun -
  Version: 23.10.1 build 5891
  Created: 12-01-2024 22:01 UTC (18:01 ADT)
  System: Linux 5.4.0-150-generic
  Runtime: Groovy 3.0.19 on OpenJDK 64-Bit Server VM 11.0.19+7-post-Ubuntu-0ubuntu118.04.1
  Encoding: UTF-8 (ANSI_X3.4-1968)
  Process: 18550@acri-AS-1124US-TNRP [127.0.1.1]
  CPUs: 128 - Mem: 1007.8 GB (709.6 GB) - Swap: 2 GB (2 GB)
May-08 15:18:54.958 [main] DEBUG nextflow.Session - Work-dir: /home/acheema/work [ext2/ext3]
May-08 15:18:55.011 [main] DEBUG nextflow.executor.ExecutorFactory - Extension executors providers=[]
May-08 15:18:55.023 [main] DEBUG nextflow.Session - Observer factory: DefaultObserverFactory
May-08 15:18:55.057 [main] DEBUG nextflow.cache.CacheFactory - Using Nextflow cache factory: nextflow.cache.DefaultCacheFactory
May-08 15:18:55.066 [main] DEBUG nextflow.util.CustomThreadPool - Creating default thread pool > poolSize: 129; maxThreads: 1000
May-08 15:18:55.114 [main] DEBUG nextflow.Session - Session start
May-08 15:18:55.644 [main] DEBUG nextflow.script.ScriptRunner - > Launching execution
May-08 15:18:55.705 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: null
May-08 15:18:55.705 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local'
May-08 15:18:55.710 [main] DEBUG nextflow.executor.Executor - [warm up] executor > local
May-08 15:18:55.714 [main] DEBUG n.processor.LocalPollingMonitor - Creating local task monitor for executor 'local' > cpus=128; memory=1007.8 GB; capacity=128; pollInterval=100ms; dumpInterval=5m
May-08 15:18:55.716 [main] DEBUG n.processor.TaskPollingMonitor - >>> barrier register (monitor: local)
May-08 15:18:55.821 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: null
May-08 15:18:55.821 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local'
May-08 15:18:55.828 [main] DEBUG nextflow.Session - Workflow process names [dsl2]: templateExample, read_count_p
May-08 15:18:55.828 [main] DEBUG nextflow.Session - Igniting dataflow network (2)
May-08 15:18:55.829 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > templateExample
May-08 15:18:55.830 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > read_count_p
May-08 15:18:55.831 [main] DEBUG nextflow.script.ScriptRunner - Parsed script files:
  Script_1e152ad49ae18340: /home/acheema/single_cell.nf
May-08 15:18:55.831 [main] DEBUG nextflow.script.ScriptRunner - > Awaiting termination
May-08 15:18:55.831 [main] DEBUG nextflow.Session - Session await
May-08 15:18:55.991 [Task submitter] DEBUG n.executor.local.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run
May-08 15:18:55.995 [Task submitter] INFO  nextflow.Session - [8d/2e0586] Submitted process > templateExample
May-08 15:19:07.473 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 1; name: templateExample; status: COMPLETED; exit: 0; error: -; workDir: /home/acheema/work/8d/2e0586013131bee894e6322a38edf7]
May-08 15:19:07.504 [Task monitor] DEBUG nextflow.util.ThreadPoolBuilder - Creating thread pool 'PublishDir' minSize=10; maxSize=384; workQueue=LinkedBlockingQueue[10000]; allowCoreThreadTimeout=false
May-08 15:19:07.537 [Task submitter] DEBUG n.executor.local.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run
May-08 15:19:07.538 [Task submitter] INFO  nextflow.Session - [6d/17dc6a] Submitted process > read_count_p
May-08 15:19:07.610 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 2; name: read_count_p; status: COMPLETED; exit: 2; error: -; workDir: /home/acheema/work/6d/17dc6ad0908c96df730a0f7c28c428]
May-08 15:19:07.618 [Task monitor] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
  task: name=read_count_p; work-dir=/home/acheema/work/6d/17dc6ad0908c96df730a0f7c28c428
  error [nextflow.exception.ProcessFailedException]: Process `read_count_p` terminated with an error exit status (2)
May-08 15:19:07.632 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'read_count_p'

Caused by:
  Process `read_count_p` terminated with an error exit status (2)

Command executed:

  Rscript read_count.R GSM3832735_wt_naive_gex.csv GSM3832737_wt_tumor_gex.csv

Command exit status:
  2

Command output:
  Fatal error: cannot open file 'read_count.R': No such file or directory

Command error:
  Fatal error: cannot open file 'read_count.R': No such file or directory

Work dir:
  /home/acheema/work/6d/17dc6ad0908c96df730a0f7c28c428

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line
May-08 15:19:07.635 [main] DEBUG nextflow.Session - Session await > all processes finished
May-08 15:19:07.638 [Task monitor] DEBUG nextflow.Session - Session aborted -- Cause: Process `read_count_p` terminated with an error exit status (2)
May-08 15:19:07.654 [main] DEBUG nextflow.Session - Session await > all barriers passed
May-08 15:19:07.655 [Task monitor] DEBUG n.processor.TaskPollingMonitor - <<< barrier arrives (monitor: local) - terminating tasks monitor poll loop
May-08 15:19:07.667 [main] DEBUG n.trace.WorkflowStatsObserver - Workflow completed > WorkflowStats[succeededCount=1; failedCount=1; ignoredCount=0; cachedCount=0; pendingCount=0; submittedCount=0; runningCount=0; retriesCount=0; abortedCount=0; succeedDuration=11.4s; failedDuration=41ms; cachedDuration=0ms;loadCpus=0; loadMemory=0; peakRunning=1; peakCpus=1; peakMemory=0; ]
May-08 15:19:07.856 [main] DEBUG nextflow.cache.CacheDB - Closing CacheDB done
May-08 15:19:07.879 [main] DEBUG nextflow.script.ScriptRunner - > Execution complete -- Goodbye

但是当我给出 R 脚本的绝对路径时,它就可以正常工作。

script:
"""
Rscript /home/acheema/bin/read_count.R ${count_files}
"""

现在它工作正常,如下所示,

acheema@acri-AS-1124US-TNRP:~$ nextflow run single_cell.nf
N E X T F L O W  ~  version 23.10.1
Launching `single_cell.nf` [astonishing_lorenz] DSL2 - revision: f279637f1a
executor >  local (2)
[26/99ed30] process > templateExample [100%] 1 of 1 ✔
[35/db989a] process > read_count_p    [100%] 1 of 1 ✔

有没有办法可以从bin文件夹中找到并读取R脚本?我已经尝试了here建议的解决方案,但没有成功。有解决办法吗?

r bash nextflow
1个回答
0
投票

这可能与文件权限有关,但很难说,因为第一个过程有效。

我所做的是将 R 脚本读入值通道,并像任何其他脚本一样读入它。好处是您还可以添加一个检查文件是否存在的函数,该函数将在管道启动之前抛出错误,而不是在 R 脚本丢失时在中途抛出错误。

另外,我只需将

download_files.sh
粘贴到流程的脚本框中。这就是 nextflow 的设计目的。和R脚本一样,但是改起来会比较烦人,所以就放弃了。

process templateExample {
  publishDir "data_analysis_files", mode:'copy'     

  output:
  path "*_gex.csv" , emit: count_files        
  
  script:
  """
  #!/bin/bash
  
  # Define the URLs of the files to download
  urls=(
      "https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3832nnn/GSM3832735/suppl/GSM3832735_wt_naive_gex.csv.gz"
      "https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3832nnn/GSM3832736/suppl/GSM3832736_wt_naive_adt.csv.gz"
      "https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3832nnn/GSM3832737/suppl/GSM3832737_wt_tumor_gex.csv.gz"
      "https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3832nnn/GSM3832738/suppl/GSM3832738_wt_tumor_adt.csv.gz" 
      "https://zenodo.org/records/5511975/files/negative_cDC1_relative_signatures.csv?download=1"
      "https://zenodo.org/records/5511975/files/positive_cDC1_relative_signatures.csv?download=1"
      "https://github.com/SIgN-Bioinformatics/sgCMAP_R_Scripts/blob/main/sgCMAP_R_Scripts/sgCMAP-internal.R"
      "https://github.com/SIgN-Bioinformatics/sgCMAP_R_Scripts/blob/main/sgCMAP_R_Scripts/sgCMAP_score.R"
      )
  
  
  # Download each file using wget
  for url in "${urls[@]}"; do
      wget "$url"
  done
  
  # Unzip each downloaded file using gunzip
  for file in *.gz;do
      gunzip "$file"
  done
  """
}


process read_count_p {
  publishDir "results",mode:'copy'

  input:
  path count_files
  path read_counts_rscript

  output:
  path "result.txt"

  """
  Rscript ${read_counts_rscript} ${count_files}
  """
}


 workflow {
   templateExample()
   read_count_p(templateExample.out.count_files)
 }

并将以下频道添加到您的频道创建脚本块中

Channel
   .fromPath(params.read_counts_rscript)
   .ifEmpty { error "No merging Rscript supplied: ${params.read_counts_rscript}" }
   .set { read_counts_rscript }
© www.soinside.com 2019 - 2024. All rights reserved.