R 中使用多个条件的子集变量名称

Question

我有以下代码：

df <- data.frame(
  check.names = FALSE,
  `Att-Bissen P [mm]` = c(57937.8),
  `Att-Bissen PET [mm]` = c(39472.9),
  `Att-Bissen Q [mm]` = c(26501.2),
  `Rau. Merl P [mm]` = c(53443.6),
  `Rau. Merl PET [mm]` = c(40535.45),
  `Rau. Merl Q [mm]` = c(15489.5),
  `Syre Felsmuhle/Mertert P [mm]` = c(46020.3),
  `Syre Felsmuhle/Mertert PET [mm]` = c(42196.4),
  `Syre Felsmuhle/Mertert Q [mm]` = c(16210.69079),
  `Wiltz-Winseler P [mm]` = c(63389.7),
  `Wiltz-Winseler PET [mm]` = c(42703.3),
  `Wiltz-Winseler Q [mm]` = c(33576.8),
  `Our-Gemund/Vianden P [mm]` = c(63389.7),
  `Our-Gemund/Vianden PET [mm]` = c(42834.5),
  `Our-Gemund/Vianden Q [mm]` = c(12588.9))

# Define the formula as a function
calc_formula <- function(P, PET, Q, n) {
  1 - (1 + (P / PET)^n) ^ -((n + 1) / (n + Q))
}

# Define the n value
n <- 2.5

# Extract the site names from the column names
site_names <- sub(" .*", "", names(df)[-1])

# Loop through each site and calculate the formula
results <- list()
for (site in site_names) {
  site_data <- df[, grepl(site, names(df))]
  results[[site]] <- calc_formula(site_data[[paste0(site, " P [mm]")]], 
                                  site_data[[paste0(site, " PET [mm]")]], 
                                  site_data[[paste0(site, " Q [mm]")]], n)
}

# Combine the results into a data frame
results_df <- data.frame(Site = names(results), Result = unlist(results))

并出现以下错误：

Error in data.frame(Site = names(results), Result = unlist(results)) : 
  arguments imply differing number of rows: 5, 3

我认为发生这种情况是因为我无法在“site_names”中指定它还应该考虑“.”和“ ”（空格）。但我无法确定 R 如何识别该空格，例如，在某些名称处但不在 P、PET 或 Q 之前。因此，它在“Rau.Merl”和“Syre Felsmuhle/”站点中返回 NA梅尔特”。

我可以轻松更改 .csv 文件中的名称，但当我拥有强大的数据集时，这是相当乏味的。

我该如何修复这部分代码？

如有任何帮助，我们将不胜感激。谢谢！！

Answer 1

你有两个问题。首先，让我们看看您的网站名称：

site_names
#  [1] "Att-Bissen"         "Att-Bissen"         "Rau."               "Rau."              
#  [5] "Rau."               "Syre"               "Syre"               "Syre"              
#  [9] "Wiltz-Winseler"     "Wiltz-Winseler"     "Wiltz-Winseler"     "Our-Gemund/Vianden"
# [13] "Our-Gemund/Vianden" "Our-Gemund/Vianden"
# Loop through each site and calculate the formula

这里有两个问题：1）重复性很差，因为您只想要每个站点一个结果。我们可以在末尾添加

... |> unique()

来删除重复项。 2) 您已通过删除第一个空格后的所有内容来提取站点名称。但您的某些网站名称中包含空格，例如

"Rau. Merl"

。问题不是

，而是空间。

让我们通过删除您实际想要删除的字符串的特定部分来解决此问题：

site_names <- sub("( P \\[mm\\])|( PET \\[mm\\])|( Q \\[mm\\])", "", names(df)[-1]) |>
  unique()
site_names
# [1] "Att-Bissen"             "Rau. Merl"              "Syre Felsmuhle/Mertert" "Wiltz-Winseler"        
# [5] "Our-Gemund/Vianden"

运行其余代码，它现在可以工作了：

# ...
results_df
#                                          Site       Result
# Att-Bissen                         Att-Bissen 0.0001695120
# Rau. Merl                           Rau. Merl 0.0002478667
# Syre Felsmuhle/Mertert Syre Felsmuhle/Mertert 0.0001742918
# Wiltz-Winseler                 Wiltz-Winseler 0.0001359271
# Our-Gemund/Vianden         Our-Gemund/Vianden 0.0003609042

R 中使用多个条件的子集变量名称

问题描述投票：0回答：1

1个回答

最新问题

R 中使用多个条件的子集变量名称

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1