我正在尝试编写一些代码,该代码将采用包含一些样本名称的 .csv 文件作为输入,并输出包含样本名称和 96 孔板或 384 孔板格式(A1、B1、C1)的 data.frame ...)。对于那些不知道的人,96 孔板有 8 个按字母顺序标记的行(A、B、C、D、E、F、G、H)和 12 个数字标记的列 (1:12),384 孔板有 16 个按字母顺序标记的行 (A:P) 和 24 个按数字标记的列 (1:24)。我正在尝试编写一些代码来生成这些格式中的任何一种(可以有两个不同的函数来执行此操作),允许将样本标记为向下(A1、B1、C1、D1、E1、F1、G1、H1 、A2...)或交叉(A1、A2、A3、A4、A5...)。
到目前为止,我已经弄清楚如何相当轻松地获取行名称
rowLetter <- rep(LETTERS[1:8], length.out = variable)
#variable will be based on how many samples I have
我只是不知道如何正确应用数字列名称...我已经尝试过:
colNumber <- rep(1:12, times = variable)
但事情没那么简单。如果要“向下”,则必须在列号增加 1 之前填充所有 8 行;如果要“横向”,则必须在行字母增加 1 之前填充所有 12 列。
编辑:
这是一个笨重的版本。它需要您拥有的样品数量、尚未起作用的“板格式”以及方向,并将返回包含孔和板编号的数据框。接下来,我将 a) 修复板格式,以便其正常工作,b) 使该函数能够获取样本名称或 ID 或其他内容的列表,并返回样本名称、孔位置和板编号!
plateLayout <- function(numOfSamples, plateFormat = 96, direction = "DOWN"){
#This assumes that each well will be filled in order. I may need to change this, but lets get it working first.
#Calculate the number of plates required
platesRequired <- ceiling(numOfSamples/plateFormat)
rowLetter <- character(0)
colNumber <- numeric(0)
plateNumber <- numeric(0)
#The following will work if the samples are going DOWN
if(direction == "DOWN"){
for(k in 1:platesRequired){
rowLetter <- c(rowLetter, rep(LETTERS[1:8], length.out = 96))
for(i in 1:12){
colNumber <- c(colNumber, rep(i, times = 8))
}
plateNumber <- c(plateNumber, rep(k, times = 96))
}
plateLayout <- paste0(rowLetter, colNumber)
plateLayout <- data.frame(plateLayout, plateNumber)
plateLayout <- plateLayout[1:numOfSamples,]
return(plateLayout)
}
#The following will work if the samples are going ACROSS
if(direction == "ACROSS"){
for(k in 1:platesRequired){
colNumber <- c(colNumber, rep(1:12, times = 8))
for(i in 1:8){
rowLetter <- c(rowLetter, rep(LETTERS[i], times = 12))
}
plateNumber <- c(plateNumber, rep(k, times = 96))
}
plateLayout <- paste0(rowLetter, colNumber)
plateLayout <- data.frame(plateLayout, plateNumber)
plateLayout <- plateLayout[1:numOfSamples,]
return(plateLayout)
}
}
有人对还有什么可以让它变得很酷有任何想法吗?我将使用此函数生成 .csv 或 .txt 文件,用作不同仪器的示例名称导入,因此我会受到“酷功能”的限制,但我认为使用 ggplot 会很酷制作一个显示板和样品名称的图形?
您不需要
for
循环。这是一个开始:
#some sample ids
ids <- c(LETTERS, letters)
#plate size:
n <- 96
nrow <- 8
samples <- character(n)
samples[seq_along(ids)] <- ids
samples <- matrix(samples, nrow=nrow)
colnames(samples) <- seq_len(n/nrow)
rownames(samples) <- LETTERS[seq_len(nrow)]
# 1 2 3 4 5 6 7 8 9 10 11 12
# A "A" "I" "Q" "Y" "g" "o" "w" "" "" "" "" ""
# B "B" "J" "R" "Z" "h" "p" "x" "" "" "" "" ""
# C "C" "K" "S" "a" "i" "q" "y" "" "" "" "" ""
# D "D" "L" "T" "b" "j" "r" "z" "" "" "" "" ""
# E "E" "M" "U" "c" "k" "s" "" "" "" "" "" ""
# F "F" "N" "V" "d" "l" "t" "" "" "" "" "" ""
# G "G" "O" "W" "e" "m" "u" "" "" "" "" "" ""
# H "H" "P" "X" "f" "n" "v" "" "" "" "" "" ""
library(reshape2)
samples <- melt(samples)
samples$position <- paste0(samples$Var1, samples$Var2)
# Var1 Var2 value position
# 1 A 1 A A1
# 2 B 1 B B1
# 3 C 1 C C1
# 4 D 1 D D1
# 5 E 1 E E1
# 6 F 1 F F1
# 7 G 1 G G1
# 8 H 1 H H1
# 9 A 2 I A2
# 10 B 2 J B2
# 11 C 2 K C2
# 12 D 2 L D2
# 13 E 2 M E2
# 14 F 2 N F2
# 15 G 2 O G2
# 16 H 2 P H2
# 17 A 3 Q A3
# 18 B 3 R B3
# 19 C 3 S C3
# 20 D 3 T D3
# 21 E 3 U E3
# 22 F 3 V F3
# 23 G 3 W G3
# 24 H 3 X H3
# 25 A 4 Y A4
# 26 B 4 Z B4
# 27 C 4 a C4
# 28 D 4 b D4
# 29 E 4 c E4
# 30 F 4 d F4
# 31 G 4 e G4
# 32 H 4 f H4
# 33 A 5 g A5
# 34 B 5 h B5
# 35 C 5 i C5
# 36 D 5 j D5
# 37 E 5 k E5
# 38 F 5 l F5
# 39 G 5 m G5
# 40 H 5 n H5
# 41 A 6 o A6
# 42 B 6 p B6
# 43 C 6 q C6
# 44 D 6 r D6
# 45 E 6 s E6
# 46 F 6 t F6
# 47 G 6 u G6
# 48 H 6 v H6
# 49 A 7 w A7
# 50 B 7 x B7
# 51 C 7 y C7
# 52 D 7 z D7
# 53 E 7 E7
# 54 F 7 F7
# 55 G 7 G7
# 56 H 7 H7
# 57 A 8 A8
# 58 B 8 B8
# 59 C 8 C8
# 60 D 8 D8
# 61 E 8 E8
# 62 F 8 F8
# 63 G 8 G8
# 64 H 8 H8
# 65 A 9 A9
# 66 B 9 B9
# 67 C 9 C9
# 68 D 9 D9
# 69 E 9 E9
# 70 F 9 F9
# 71 G 9 G9
# 72 H 9 H9
# 73 A 10 A10
# 74 B 10 B10
# 75 C 10 C10
# 76 D 10 D10
# 77 E 10 E10
# 78 F 10 F10
# 79 G 10 G10
# 80 H 10 H10
# 81 A 11 A11
# 82 B 11 B11
# 83 C 11 C11
# 84 D 11 D11
# 85 E 11 E11
# 86 F 11 F11
# 87 G 11 G11
# 88 H 11 H11
# 89 A 12 A12
# 90 B 12 B12
# 91 C 12 C12
# 92 D 12 D12
# 93 E 12 E12
# 94 F 12 F12
# 95 G 12 G12
# 96 H 12 H12
使用
byrow
参数在另一个方向填充矩阵:
samples <- matrix(samples, nrow=nrow, byrow=TRUE)
要填充多个盘子,您可以使用基本相同的想法,但使用数组而不是矩阵。
我以前从未用 R 编写过这段代码,但它应该与 Perl、Python 或 Java 相同
对于行主序(遍历),伪代码算法很简单:
for each( i : 0..totalNumWells - 1){
column = (i % numColumns)
row = ((i % totalNumWells) / numColumns)
}
其中 96 孔板的 numColumns 为 12、24 或 384,
totalNumWells
分别为 96 或 384。这将为您提供基于 0 的坐标中的列和行索引,非常适合访问数组。
wellName = ABCs[row], column + 1
其中 ABC 是车牌中所有有效字母(或 A-Z)的数组。
+1
是将0基转换为1基,否则第一口井将是A0而不是A1。
我还想指出,通常 384 口井不按行主要顺序排列。我经常看到测序中心更喜欢“棋盘”模式 A01、A03、A05...然后是 A02、A04、A06...、B01、B03...等,以便能够组合 4 个 96 孔板无需改变布局即可集成到单个 384 孔中,并简化了采摘机器人的工作。这是一个更难计算第 i 个的算法
以下代码完成了我打算做的事情。您可以使用它来制作所需数量的印版,前提是您的导入列表将按顺序排列。它可以根据您的需要制作尽可能多的印版,并将添加一个“plateNumber”列,该列将指示它所在的批次。它只能处理 96 或 384 孔板,但这就是我所处理的全部,所以没关系。
plateLayout <- function(numOfSamples, plateFormat = 96, direction = "DOWN"){
#This assumes that each well will be filled in order.
#Calculate the number of plates required
platesRequired <- ceiling(numOfSamples/plateFormat)
rowLetter <- character(0)
colNumber <- numeric(0)
plateNumber <- numeric(0)
#define the number of columns and number of rows based on plate format (96 or 384 well plate)
switch(as.character(plateFormat),
"96" = {numberOfColumns = 12; numberOfRows = 8},
"384" = {numberOfColumns = 24; numberOfRows = 16})
#The following will work if the samples are going DOWN
if(direction == "DOWN"){
for(k in 1:platesRequired){
rowLetter <- c(rowLetter, rep(LETTERS[1:numberOfRows], length.out = plateFormat))
for(i in 1:numberOfColumns){
colNumber <- c(colNumber, rep(i, times = numberOfRows))
}
plateNumber <- c(plateNumber, rep(k, times = plateFormat))
}
plateLayout <- paste0(rowLetter, colNumber)
plateLayout <- data.frame(plateNumber,plateLayout)
plateLayout <- plateLayout[1:numOfSamples,]
return(plateLayout)
}
#The following will work if the samples are going ACROSS
if(direction == "ACROSS"){
for(k in 1:platesRequired){
colNumber <- c(colNumber, rep(1:numberOfColumns, times = numberOfRows))
for(i in 1:numberOfRows){
rowLetter <- c(rowLetter, rep(LETTERS[i], times = numberOfColumns))
}
plateNumber <- c(plateNumber, rep(k, times = plateFormat))
}
plateLayout <- paste0(rowLetter, colNumber)
plateLayout <- data.frame(plateNumber, plateLayout)
plateLayout <- plateLayout[1:numOfSamples,]
return(plateLayout)
}
}
如何使用它的示例如下
#load whatever data you're going to use to get a plate layout on (sample ID's or names or whatever)
thisData <- read.csv("data.csv")
#make a data.frame containing your sample names and the function's output
#alternatively you can use length() if you have a list
plateLayoutDataFrame <- data.frame(thisData$sampleNames, plateLayout(nrow(thisData), plateFormat = 96, direction = "DOWN")
#It will return something similar to the following, depending on your selections
#data plateNumber plateLayout
#sample1 1 A1
#sample2 1 B1
#sample3 1 C1
#sample4 1 D1
#sample5 1 E1
#sample6 1 F1
#sample7 1 G1
#sample8 1 H1
#sample9 1 A2
#sample10 1 B2
#sample11 1 C2
#sample12 1 D2
#sample13 1 E2
#sample14 1 F2
#sample15 1 G2
现在总结一下这个功能。 Roland 提供了一种很好的方法,该方法不太冗长,但我想尽可能避免使用外部包。我现在正在开发一个
shiny
应用程序,它实际上使用了这个!我希望它能够根据“plateNumber”自动进行子集化,并将每个板写入它自己的文件...有关更多信息,请访问:R-Shiny 中的自动多文件下载
在链条中有点晚,但使用
expand.grid
会非常有帮助。此外,我发现下游处理有时会受益于能够对井名进行排序。此示例中的前导零有助于确保“A1”、“A2”...“A10”、“A11”中的“A2”位于“A10”之前。
plateLayout <- function(nSamples, nPlates, plateFormat = c("96", "384"),
direction = c("down", "across")) {
# process arguments
nSamples <- as.integer(nSamples)
plateFormat <- match.arg(plateFormat)
plateFormat <- as.integer(plateFormat)
direction <- match.arg(direction)
nCol <- ifelse(plateFormat == 96, 12, 24)
nRow <- ifelse(plateFormat == 96, 8, 16)
if (missing(nPlates))
nPlates <- ceiling(nSamples/plateFormat)
# use expand.grid and organize as 'plate', 'row' and 'column'
if (direction == "across") {
v <- expand.grid(column = seq_len(nCol), row = LETTERS[1:nRow],
plate = seq_len(nPlates), stringsAsFactors = FALSE)
v <- v[c(3, 2, 1)]
}
else {
v <- expand.grid(row = LETTERS[1:nRow], column = seq_len(nCol),
plate = seq_len(nPlates), stringsAsFactors = FALSE)
v <- v[c(3, 1, 2)]
}
# assemble data.frame
# note that the format string for sprintf provides a leading '0'
# change to "%s%d" to NOTuse a leading zero
well <- apply(v, 1, function(x) sprintf("%s%02d", x[2], as.integer(x[3])))
plate <- data.frame(plate = v[[1]], well)[seq_len(nSamples),]
return(plate)
}
我就是这样做的。
put_samples_in_plates = function(sample_list, nwells=96, direction="across")
{
if(!nwells %in% c(96, 384)){
stop("Invalid plate size")
}
nsamples = nrow(sample_list)
nplates = ceiling(nsamples/nwells);
if(nwells==96){
rows = LETTERS[1:8]
cols = 1:12
}else if(nwells==384){
rows = LETTERS[1:16]
cols = 1:24
}else{
stop("Unrecognized nwells")
}
nrows = length(rows)
ncols = length(cols)
if(tolower(direction)=="down"){
single_plate_df = data.frame(row = rep(rows, times=ncols),
col = rep(cols, each=nrows))
}else if(tolower(direction)=="across"){
single_plate_df = data.frame(row = rep(rows, each=ncols),
col = rep(cols, times=nrows))
}else{
stop("Unrecognized direction")
}
single_plate_df = transform(single_plate_df,
well = sprintf("%s%02d", row, col))
toobig_plate_df = cbind(data.frame(plate=rep(1:nplates, each=nwells)),
do.call("rbind", replicate(nplates,
single_plate_df,
simplify=FALSE)))
res = cbind(sample_list, toobig_plate_df[1:nsamples,])
return(res)}
# Quick test
a_sample_list = data.frame(x=1:386, y=rnorm(386))
r.096.across = put_samples_in_plates(sample_list = a_sample_list,
nwells= 96,
direction="across")
r.096.down = put_samples_in_plates(sample_list = a_sample_list,
nwells= 96,
direction="down")
r.384.across = put_samples_in_plates(sample_list = a_sample_list,
nwells=384,
direction="across")
r.384.down = put_samples_in_plates(sample_list = a_sample_list,
nwells=384,
direction="down")
上面的函数中有两点值得注意: