在两个字符串中查找匹配对

Question

我有一个数据集，看起来像我在下面创建的示例

ex_data <- data.frame(
  House = c("House 1", "House 2", "House 3"),
  features = c("Roof, Walls, Windows, Oven", "Oven, Roof, Walls, TV", "Size, Oven, Bedrooms"),
  attributes = c("Large, White, 5, Whirlpool", "Samsung, Large, White, Sony", "4000 sq ft, KitchenAid, 5")
)

基本上，我的数据集具有主要的石斑鱼（House），并且具有关于该房屋的特征。特征是逗号分隔的值。

我正在尝试找出如何在每个房屋中找到烤箱以及归属于烤箱的品牌。我知道每个房子的features字符串中都有一个烤箱，但我不知道如何找到相应的品牌。

[我的想法是，我将使用函数separate为每个要素和属性创建一个新列，但是我的实际数据集最多包含100个逗号分隔的features/attributes。

我希望最终结果看起来像这样

ex_data_result <- data.frame(
  House = c("House 1", "House 2", "House 3"),
  features = c("Oven", "Oven", "Oven"),
  attributes = c("Whirlpool", "Samsung", "KitchenAid")
)

谢谢你。

Answer 1

我们可以使用separate_rows中的tidyr将列“功能”，“属性”划分为长格式，方法是将sep指定为,，后接零个或多个数字，然后将filter行“功能”是“烤箱”

library(dplyr)
library(tidyr)
 ex_data %>% 
  separate_rows(features, attributes, sep=",\\s*") %>%
  filter(features == 'Oven')

Answer 2

您可以在","和features的attributes上分割字符串，然后在attributes中选择相应的features = "Oven"。

在基本R中，您可以使用strsplit和mapply来进行此操作：

ex_data$Brand <- mapply(function(x, y) y[x == 'Oven'], 
             strsplit(ex_data$features,", "), strsplit(ex_data$attributes,", "))

ex_data
#    House                   features                  attributes      Brand
#1 House 1 Roof, Walls, Windows, Oven  Large, White, 5, Whirlpool  Whirlpool
#2 House 2      Oven, Roof, Walls, TV Samsung, Large, White, Sony    Samsung
#3 House 3       Size, Oven, Bedrooms   4000 sq ft, KitchenAid, 5 KitchenAid

在两个字符串中查找匹配对

问题描述投票：1回答：2

2个回答

最新问题

在两个字符串中查找匹配对

问题描述 投票：1回答：2

2个回答

最新问题

问题描述投票：1回答：2