如何重复代码，每个块的名称都更改？（带有R）

Question

感谢谁能帮助我。

我正在处理从QIIME获得的一些输出，这些文本是我想要操纵以获取箱形图的文本。每个输入的格式都相同，因此操作始终相同，但是会更改源名称。对于每个输入，我要提取最后5行，为每个列/样本取一个平均值，将值与从映射文件中获取的样本实验标签（组）相关联，并按照它们用于制作所有箱形图的顺序排列获得的6个数据。

[在bash中，我执行类似[for i in GG97 GG100 SILVA97 SILVA100 NCBI RDP; do cp ${i}/alpha/collated_alpha/chao1.txt alpha_tot/${i}_chao1.txt; done]的操作来多次执行命令，以通过${i}以真实的方式更改代码中的名称。我正在努力寻找一种与R相同的方法。我想创建一个包含名称的向量，然后通过将for与i等移动来使用[1], [2]循环，但这种方法不起作用，它停在read.delim行中，但未在wd中找到文件。

这是我编写的操作代码。发表评论后，它将对我正在使用的6个数据库（GG97 GG100 SILVA97 SILVA100 NCBI RDP）重复6次。

另外，由于要使用4个指标，因此我重复了4次此过程（这里我显示的是shannon，但我也有chao1，observed_species和PD_whole_tree的代码副本。]

library(tidyverse)
library(labelled)

mapfile <- read.delim(file="mapfile_HC+BV.txt", check.names=FALSE);
mapfile <- mapfile[,c(1,4)]
colnames(mapfile) <- c("SampleID","Pathology_group")

#GG97
 collated <- read.delim(file="alpha_diversity/GG97_shannon.txt", check.names=FALSE);
  collated <- tail(collated,5); collated <- collated[,-c(1:3)]
  collated_reorder <- collated[,match(mapfile[,1], colnames(collated))]

  labels <- t(mapfile)
  colnames(collated_reorder) <- labels[2,]

  mean <- colMeans(collated_reorder, na.rm = FALSE, dims = 1)
  mean = as.matrix(mean); mean <- t(mean)

  GG97_shannon <- as.data.frame(rbind(labels[2,],mean))
  GG97_shannon <- t(GG97_shannon); 

  DB_type <- list(DB = "GG97"); DB_type <- rep(DB_type, 41)
  GG97_shannon <- as.data.frame(cbind(DB_type,GG97_shannon))
  colnames(GG97_shannon) <- c("DB","Group","value")
  rm(collated,collated_reorder,DB_type,labels,mean)

在这里，我将所有输出粘贴在一起，冻结订单并制作箱线图。

alpha_shannon <- as.data.frame(rbind(GG97_shannon,GG100_shannon,SILVA97_shannon,SILVA100_shannon,NCBI_shannon,RDP_shannon))
rownames(alpha_shannon) <- NULL
  rm(GG97_shannon,GG100_shannon,SILVA97_shannon,SILVA100_shannon,NCBI_shannon,RDP_shannon)

    alpha_shannon$Group = factor(alpha_shannon$Group, unique(alpha_shannon$Group))
    alpha_shannon$DB = factor(alpha_shannon$DB, unique(alpha_shannon$DB))

library(ggplot2)
ggplot(data = alpha_shannon) +
  aes(x = DB, y = value, colour = Group) +
  geom_boxplot()+
  labs(title = 'Shannon',
       x = 'Database',
       y = 'Diversity') +
  theme(legend.position = 'bottom')+ 
  theme_grey(base_size = 16)

我如何保持此代码为“ DRY”，并且不需要146行代码来重复一遍又一遍？谢谢！

Answer 1

您未提供Minimal reproducible example，因此此答案不能保证正确性。

要注意的重要一点是您使用rm(...)，因此这意味着某些变量仅在特定范围内相关。因此，将此范围封装到一个函数中。这使您的代码可重复使用，并省去了手动删除变量的麻烦：

process <- function(file, DB){
  # -> Use the function parameter `file` instead of a hardcoded filename
  collated <- read.delim(file=file, check.names=FALSE);  
  collated <- tail(collated,5); collated <- collated[,-c(1:3)]
  collated_reorder <- collated[,match(mapfile[,1], colnames(collated))]

  labels <- t(mapfile)
  colnames(collated_reorder) <- labels[2,]

  mean <- colMeans(collated_reorder, na.rm = FALSE, dims = 1)
  mean = as.matrix(mean); mean <- t(mean)

  # -> rename this variable to a more general name, e.g. `result`
  result <- as.data.frame(rbind(labels[2,],mean))
  result <- t(result); 

  # -> Use the function parameter `DB` instead of a hardcoded string
  DB_type <- list(DB = DB); DB_type <- rep(DB_type, 41)
  result <- as.data.frame(cbind(DB_type,result))
  colnames(result) <- c("DB","Group","value")

  # -> After the end of this function, the variables defined in this function
  #    vanish automatically, you just need to specify the result
  return(result)
}

现在您可以重复使用该块：

GG97_shannon      <- process(file = "alpha_diversity/GG97_shannon.txt", DB = "GG97)
GG100_shannon     <- process(file =...., DB = ....)
SILVA97_shannon   <- ...
SILVA100_shannon  <- ...
NCBI_shannon      <- ...
RDP_shannon       <- ...

或者，您可以使用循环结构：

通用for：

datasets <-  c("GG97_shannon", "GG100_shannon", "SILVA97_shannon", 
               "SILVA100_shannon", "NCBI_shannon", "RDP_shannon")
files    <-  c("alpha_diversity/GG97_shannon.txt", .....)
DBs      <-  c("GG97", ....)
result   <-  list()

for(i in seq_along(datasets)){
   result[[datasets[i]]] <- process(files[i], DBs[i])
}

mapply，一个“特殊的for”，用于并行循环多个向量：

# the first argument is the function from above, the other ones are given as arguments
# to our process(.) function
results <- mapply(process, files, DBs)

如何重复代码，每个块的名称都更改？（带有R）

问题描述投票：0回答：1

1个回答

最新问题

如何重复代码，每个块的名称都更改？ （带有R）

问题描述 投票：0回答：1

1个回答

最新问题

如何重复代码，每个块的名称都更改？（带有R）

问题描述投票：0回答：1