保留R中具有特定字符串值的行数据

问题描述 投票:0回答:2

首先,我有字符串列表:

/index.php/abc/def
/link/view/id/123
/subject/view/id/456

然后,我有这样的数据集:

Date and Time          Request
2016-01-17 05:46:26    aladdine.com/view/id/786
2016-01-17 05:46:30    aladdine.com/subject/view/id/456
2016-01-17 05:46:31    aladdine.com/pub/link/view/id/123
2016-01-17 05:46:44    aladdine.com/index.php/abc/def/ghi
2016-01-17 05:46:58    aladdine.com/brs/view/id.266

如何保留与上一个列表具有相似文本的数据集?

输出:

Date and Time          Request
2016-01-17 05:46:30    aladdine.com/subject/view/id/456
2016-01-17 05:46:31    aladdine.com/pub/link/view/id/123
2016-01-17 05:46:44    aladdine.com/index.php/abc/def/ghi
r regex string-comparison
2个回答
0
投票

使用@Cinnamon Star使用相同的数据集,您可以:

dataSet <- CO2;
iList <- list("Qn1", "Mn1", "Mc1");

将所有字符串连接到(str1|str2|str3)类型的单个正则表达式模式:

pat = paste(unlist (iList),collapse = "|")
pat = paste0("(",pat,")")

然后执行grepl以确定列Plant中哪些行包含该文本。

dataSet[grepl(pattern = pat,x = dataSet$Plant),]

结果:

   Plant        Type  Treatment conc uptake
1    Qn1      Quebec nonchilled   95   16.0
2    Qn1      Quebec nonchilled  175   30.4
3    Qn1      Quebec nonchilled  250   34.8
4    Qn1      Quebec nonchilled  350   37.2
5    Qn1      Quebec nonchilled  500   35.3
6    Qn1      Quebec nonchilled  675   39.2
7    Qn1      Quebec nonchilled 1000   39.7
43   Mn1 Mississippi nonchilled   95   10.6
44   Mn1 Mississippi nonchilled  175   19.2
45   Mn1 Mississippi nonchilled  250   26.2
46   Mn1 Mississippi nonchilled  350   30.0
47   Mn1 Mississippi nonchilled  500   30.9
48   Mn1 Mississippi nonchilled  675   32.4
49   Mn1 Mississippi nonchilled 1000   35.5
64   Mc1 Mississippi    chilled   95   10.5
65   Mc1 Mississippi    chilled  175   14.9
66   Mc1 Mississippi    chilled  250   18.1
67   Mc1 Mississippi    chilled  350   18.9
68   Mc1 Mississippi    chilled  500   19.5
69   Mc1 Mississippi    chilled  675   22.2
70   Mc1 Mississippi    chilled 1000   21.9

0
投票

我从q数据库中取出了CO2示例。请将您的数据集分配给dataSet,将您的列表分配给iList,并将所有出现的dataSet$Plant更改为您感兴趣的列(可能是dataSet$Request)。

生成的数据集保存在results中。

rm(list = ls());

dataSet <- CO2;

varsToCheck <- dataSet$Plant;

iList <- list("Qn1", "Mn1", "Mc1");

# Iterate over all rows
for(i in 1:length(dataSet$Plant)) {
  # Extract string for checking
  validateString <- varsToCheck[i];
  # Iterate over all match criterions
  for(j in 1:length(iList)) {
    # Extract the match criterion
    matchString <- iList[[j]];
    # Validate if part of the string match the criterion
    if(grepl(matchString, validateString)) {
      # Create results object when we first add a row
      if(exists("results")) {
        results <- rbind(results, dataSet[i,]);
      } else {
        results <- dataSet[i,];
      }
    }
  }

}
© www.soinside.com 2019 - 2024. All rights reserved.