我正在开发一个基于 nextflow 的管道,我有两个用于下载文件的进程,如下所示,
process templateExample{
publishDir "data_analysis_files", mode:'copy'
output:
path "*_gex.csv" , emit: count_files
script:
'''
"download_files.sh"
'''
}
process read_count_p{
publishDir "results",mode:'copy'
input:
path count_files
output:
path "result.txt"
"""
Rscript read_count.R ${count_files}
"""
}
workflow {
templateExample()
read_count_p(templateExample.out.count_files)
}
脚本
download_files.sh
和 read_count.R
存在于 bin 文件夹中,但问题是,当我执行 nextflow 时,它会从 bin 文件夹中找到并执行名为 download_files.sh
的 bash 脚本,而不是名为 read_count.R 的 R 脚本。下面给出了 bash 脚本和 R 脚本。下面还给出了错误,
#!/bin/bash
# Define the URLs of the files to download
urls=(
"https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3832nnn/GSM3832735/suppl/GSM3832735_wt_naive_gex.csv.gz"
"https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3832nnn/GSM3832736/suppl/GSM3832736_wt_naive_adt.csv.gz"
"https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3832nnn/GSM3832737/suppl/GSM3832737_wt_tumor_gex.csv.gz"
"https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3832nnn/GSM3832738/suppl/GSM3832738_wt_tumor_adt.csv.gz"
"https://zenodo.org/records/5511975/files/negative_cDC1_relative_signatures.csv?download=1"
"https://zenodo.org/records/5511975/files/positive_cDC1_relative_signatures.csv?download=1"
"https://github.com/SIgN-Bioinformatics/sgCMAP_R_Scripts/blob/main/sgCMAP_R_Scripts/sgCMAP-internal.R"
"https://github.com/SIgN-Bioinformatics/sgCMAP_R_Scripts/blob/main/sgCMAP_R_Scripts/sgCMAP_score.R"
)
# Download each file using wget
for url in "${urls[@]}"; do
wget "$url"
done
# Unzip each downloaded file using gunzip
for file in *.gz;do
gunzip "$file"
done
R 脚本是
#!/user/bin/R
args <- commandArgs(trailingOnly = TRUE)
print(args[0])
my_vec <- c(args[0],args[1],args[0],class(args),args[2])
write.table(my_vec,"result1.txt")
错误如下,
acheema@acri-AS-1124US-TNRP:~$ nextflow run single_cell.nf
N E X T F L O W ~ version 23.10.1
Launching `single_cell.nf` [soggy_sanger] DSL2 - revision: f55ed68615
executor > local (2)
[8d/2e0586] process > templateExample [100%] 1 of 1 ✔
[6d/17dc6a] process > read_count_p [100%] 1 of 1, failed: 1 ✘
ERROR ~ Error executing process > 'read_count_p'
Caused by:
Process `read_count_p` terminated with an error exit status (2)
Command executed:
Rscript read_count.R GSM3832735_wt_naive_gex.csv GSM3832737_wt_tumor_gex.csv
Command exit status:
2
Command output:
Fatal error: cannot open file 'read_count.R': No such file or directory
Command error:
Fatal error: cannot open file 'read_count.R': No such file or directory
Work dir:
/home/acheema/work/6d/17dc6ad0908c96df730a0f7c28c428
Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line
-- Check '.nextflow.log' file for details
.nextflow.log
如下所示,
acheema@acri-AS-1124US-TNRP:~$ cat .nextflow.log
May-08 15:18:54.580 [main] DEBUG nextflow.cli.Launcher - $> nextflow run single_cell.nf
May-08 15:18:54.712 [main] INFO nextflow.cli.CmdRun - N E X T F L O W ~ version 23.10.1
May-08 15:18:54.734 [main] DEBUG nextflow.plugin.PluginsFacade - Setting up plugin manager > mode=prod; embedded=false; plugins-dir=/home/acheema/.nextflow/plugins; core-plugins: [email protected],[email protected],[email protected],[email protected],[email protected],[email protected],[email protected],[email protected],[email protected]
May-08 15:18:54.743 [main] INFO o.pf4j.DefaultPluginStatusProvider - Enabled plugins: []
May-08 15:18:54.744 [main] INFO o.pf4j.DefaultPluginStatusProvider - Disabled plugins: []
May-08 15:18:54.747 [main] INFO org.pf4j.DefaultPluginManager - PF4J version 3.4.1 in 'deployment' mode
May-08 15:18:54.757 [main] INFO org.pf4j.AbstractPluginManager - No plugins
May-08 15:18:54.817 [main] DEBUG nextflow.cli.CmdRun - Applied DSL=2 from script declararion
May-08 15:18:54.832 [main] INFO nextflow.cli.CmdRun - Launching `single_cell.nf` [soggy_sanger] DSL2 - revision: f55ed68615
May-08 15:18:54.833 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins default=[]
May-08 15:18:54.833 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins resolved requirement=[]
May-08 15:18:54.840 [main] DEBUG n.secret.LocalSecretsProvider - Secrets store: /home/acheema/.nextflow/secrets/store.json
May-08 15:18:54.846 [main] DEBUG nextflow.secret.SecretsLoader - Discovered secrets providers: [nextflow.secret.LocalSecretsProvider@783ec989] - activable => nextflow.secret.LocalSecretsProvider@783ec989
May-08 15:18:54.899 [main] DEBUG nextflow.Session - Session UUID: 34564b50-df93-4baa-8861-cba8231186f4
May-08 15:18:54.900 [main] DEBUG nextflow.Session - Run name: soggy_sanger
May-08 15:18:54.901 [main] DEBUG nextflow.Session - Executor pool size: 128
May-08 15:18:54.908 [main] DEBUG nextflow.file.FilePorter - File porter settings maxRetries=3; maxTransfers=50; pollTimeout=null
May-08 15:18:54.911 [main] DEBUG nextflow.util.ThreadPoolBuilder - Creating thread pool 'FileTransfer' minSize=10; maxSize=384; workQueue=LinkedBlockingQueue[10000]; allowCoreThreadTimeout=false
May-08 15:18:54.938 [main] DEBUG nextflow.cli.CmdRun -
Version: 23.10.1 build 5891
Created: 12-01-2024 22:01 UTC (18:01 ADT)
System: Linux 5.4.0-150-generic
Runtime: Groovy 3.0.19 on OpenJDK 64-Bit Server VM 11.0.19+7-post-Ubuntu-0ubuntu118.04.1
Encoding: UTF-8 (ANSI_X3.4-1968)
Process: 18550@acri-AS-1124US-TNRP [127.0.1.1]
CPUs: 128 - Mem: 1007.8 GB (709.6 GB) - Swap: 2 GB (2 GB)
May-08 15:18:54.958 [main] DEBUG nextflow.Session - Work-dir: /home/acheema/work [ext2/ext3]
May-08 15:18:55.011 [main] DEBUG nextflow.executor.ExecutorFactory - Extension executors providers=[]
May-08 15:18:55.023 [main] DEBUG nextflow.Session - Observer factory: DefaultObserverFactory
May-08 15:18:55.057 [main] DEBUG nextflow.cache.CacheFactory - Using Nextflow cache factory: nextflow.cache.DefaultCacheFactory
May-08 15:18:55.066 [main] DEBUG nextflow.util.CustomThreadPool - Creating default thread pool > poolSize: 129; maxThreads: 1000
May-08 15:18:55.114 [main] DEBUG nextflow.Session - Session start
May-08 15:18:55.644 [main] DEBUG nextflow.script.ScriptRunner - > Launching execution
May-08 15:18:55.705 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: null
May-08 15:18:55.705 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local'
May-08 15:18:55.710 [main] DEBUG nextflow.executor.Executor - [warm up] executor > local
May-08 15:18:55.714 [main] DEBUG n.processor.LocalPollingMonitor - Creating local task monitor for executor 'local' > cpus=128; memory=1007.8 GB; capacity=128; pollInterval=100ms; dumpInterval=5m
May-08 15:18:55.716 [main] DEBUG n.processor.TaskPollingMonitor - >>> barrier register (monitor: local)
May-08 15:18:55.821 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: null
May-08 15:18:55.821 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local'
May-08 15:18:55.828 [main] DEBUG nextflow.Session - Workflow process names [dsl2]: templateExample, read_count_p
May-08 15:18:55.828 [main] DEBUG nextflow.Session - Igniting dataflow network (2)
May-08 15:18:55.829 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > templateExample
May-08 15:18:55.830 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > read_count_p
May-08 15:18:55.831 [main] DEBUG nextflow.script.ScriptRunner - Parsed script files:
Script_1e152ad49ae18340: /home/acheema/single_cell.nf
May-08 15:18:55.831 [main] DEBUG nextflow.script.ScriptRunner - > Awaiting termination
May-08 15:18:55.831 [main] DEBUG nextflow.Session - Session await
May-08 15:18:55.991 [Task submitter] DEBUG n.executor.local.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run
May-08 15:18:55.995 [Task submitter] INFO nextflow.Session - [8d/2e0586] Submitted process > templateExample
May-08 15:19:07.473 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 1; name: templateExample; status: COMPLETED; exit: 0; error: -; workDir: /home/acheema/work/8d/2e0586013131bee894e6322a38edf7]
May-08 15:19:07.504 [Task monitor] DEBUG nextflow.util.ThreadPoolBuilder - Creating thread pool 'PublishDir' minSize=10; maxSize=384; workQueue=LinkedBlockingQueue[10000]; allowCoreThreadTimeout=false
May-08 15:19:07.537 [Task submitter] DEBUG n.executor.local.LocalTaskHandler - Launch cmd line: /bin/bash -ue .command.run
May-08 15:19:07.538 [Task submitter] INFO nextflow.Session - [6d/17dc6a] Submitted process > read_count_p
May-08 15:19:07.610 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 2; name: read_count_p; status: COMPLETED; exit: 2; error: -; workDir: /home/acheema/work/6d/17dc6ad0908c96df730a0f7c28c428]
May-08 15:19:07.618 [Task monitor] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
task: name=read_count_p; work-dir=/home/acheema/work/6d/17dc6ad0908c96df730a0f7c28c428
error [nextflow.exception.ProcessFailedException]: Process `read_count_p` terminated with an error exit status (2)
May-08 15:19:07.632 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'read_count_p'
Caused by:
Process `read_count_p` terminated with an error exit status (2)
Command executed:
Rscript read_count.R GSM3832735_wt_naive_gex.csv GSM3832737_wt_tumor_gex.csv
Command exit status:
2
Command output:
Fatal error: cannot open file 'read_count.R': No such file or directory
Command error:
Fatal error: cannot open file 'read_count.R': No such file or directory
Work dir:
/home/acheema/work/6d/17dc6ad0908c96df730a0f7c28c428
Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line
May-08 15:19:07.635 [main] DEBUG nextflow.Session - Session await > all processes finished
May-08 15:19:07.638 [Task monitor] DEBUG nextflow.Session - Session aborted -- Cause: Process `read_count_p` terminated with an error exit status (2)
May-08 15:19:07.654 [main] DEBUG nextflow.Session - Session await > all barriers passed
May-08 15:19:07.655 [Task monitor] DEBUG n.processor.TaskPollingMonitor - <<< barrier arrives (monitor: local) - terminating tasks monitor poll loop
May-08 15:19:07.667 [main] DEBUG n.trace.WorkflowStatsObserver - Workflow completed > WorkflowStats[succeededCount=1; failedCount=1; ignoredCount=0; cachedCount=0; pendingCount=0; submittedCount=0; runningCount=0; retriesCount=0; abortedCount=0; succeedDuration=11.4s; failedDuration=41ms; cachedDuration=0ms;loadCpus=0; loadMemory=0; peakRunning=1; peakCpus=1; peakMemory=0; ]
May-08 15:19:07.856 [main] DEBUG nextflow.cache.CacheDB - Closing CacheDB done
May-08 15:19:07.879 [main] DEBUG nextflow.script.ScriptRunner - > Execution complete -- Goodbye
但是当我给出 R 脚本的绝对路径时,它就可以正常工作。
script:
"""
Rscript /home/acheema/bin/read_count.R ${count_files}
"""
现在它工作正常,如下所示,
acheema@acri-AS-1124US-TNRP:~$ nextflow run single_cell.nf
N E X T F L O W ~ version 23.10.1
Launching `single_cell.nf` [astonishing_lorenz] DSL2 - revision: f279637f1a
executor > local (2)
[26/99ed30] process > templateExample [100%] 1 of 1 ✔
[35/db989a] process > read_count_p [100%] 1 of 1 ✔
有没有办法可以从bin文件夹中找到并读取R脚本?我已经尝试了here建议的解决方案,但没有成功。有解决办法吗?
这可能与文件权限有关,但很难说,因为第一个过程有效。
我所做的是将 R 脚本读入值通道,并像任何其他脚本一样读入它。好处是您还可以添加一个检查文件是否存在的函数,该函数将在管道启动之前抛出错误,而不是在 R 脚本丢失时在中途抛出错误。
另外,我只需将
download_files.sh
粘贴到流程的脚本框中。这就是 nextflow 的设计目的。和R脚本一样,但是改起来会比较烦人,所以就放弃了。
process templateExample {
publishDir "data_analysis_files", mode:'copy'
output:
path "*_gex.csv" , emit: count_files
script:
"""
#!/bin/bash
# Define the URLs of the files to download
urls=(
"https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3832nnn/GSM3832735/suppl/GSM3832735_wt_naive_gex.csv.gz"
"https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3832nnn/GSM3832736/suppl/GSM3832736_wt_naive_adt.csv.gz"
"https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3832nnn/GSM3832737/suppl/GSM3832737_wt_tumor_gex.csv.gz"
"https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM3832nnn/GSM3832738/suppl/GSM3832738_wt_tumor_adt.csv.gz"
"https://zenodo.org/records/5511975/files/negative_cDC1_relative_signatures.csv?download=1"
"https://zenodo.org/records/5511975/files/positive_cDC1_relative_signatures.csv?download=1"
"https://github.com/SIgN-Bioinformatics/sgCMAP_R_Scripts/blob/main/sgCMAP_R_Scripts/sgCMAP-internal.R"
"https://github.com/SIgN-Bioinformatics/sgCMAP_R_Scripts/blob/main/sgCMAP_R_Scripts/sgCMAP_score.R"
)
# Download each file using wget
for url in "${urls[@]}"; do
wget "$url"
done
# Unzip each downloaded file using gunzip
for file in *.gz;do
gunzip "$file"
done
"""
}
process read_count_p {
publishDir "results",mode:'copy'
input:
path count_files
path read_counts_rscript
output:
path "result.txt"
"""
Rscript ${read_counts_rscript} ${count_files}
"""
}
workflow {
templateExample()
read_count_p(templateExample.out.count_files)
}
并将以下频道添加到您的频道创建脚本块中
Channel
.fromPath(params.read_counts_rscript)
.ifEmpty { error "No merging Rscript supplied: ${params.read_counts_rscript}" }
.set { read_counts_rscript }