R:如何将关键字集与预定义类别相匹配?

问题描述 投票:0回答:1

我正在 R 中工作。我有一个(大)数据集(

data.frame
),其中我有多个用于观察的关键字,并且每个观察我想将这些关键字匹配(转换)为预定义的类别(也表示为另一个
data.frame
)。我被这个问题困住了,所以我希望得到一些关于如何实现这一目标的指导。 谢谢, G

输入:

A)

a

data.frame
(
input.df
) 具有多个观察值 (
S1
,
S2
,
S3
, ....) 和相关关键字;这些关键字可以是随机顺序的,并且数量可以不同。例如,观测值
S1
S2
S4
有 3 个与之关联的关键字(可能按随机顺序),但
S3
只有 2 个关键字。

B)

另一个

data.frame
(
season.df
),预定义类别包含一组关键字。

input.df <- data.frame("observation" = c("S1", "S1", "S1", "S2", "S2", "S2", "S3", "S3", "S4", "S4", "S4"),
                       "keyword" = c("freezing", "snow", "slippery", "slippery", "snow", "freezing", "snow", "freezing", "sun", "hot", "airco") )

season.df <- data.frame("Winter" = c("slippery", "freezing", "snow"),
                        "Summer" = c("sun", "airco", "hot") )

> input.df
   observation  keyword
1           S1 freezing
2           S1     snow
3           S1 slippery
4           S2 slippery
5           S2     snow
6           S2 freezing
7           S3     snow
8           S3 freezing
9           S4      sun
10          S4      hot
11          S4    airco
> season.df
    Winter Summer
1 slippery    sun
2 freezing  airco
3     snow    hot
>

预期输出:

根据关键字,我想将每个观察结果“转换”为预定义的类别。 因此,在上面的示例中,我想将

S1
(3 个关键字)分类为
Winter
S2
(3 个单词)分类为
Winter
S3
(2 个单词)分类为
Winter
,以及
S4
Summer

如前所述,任何入门指南将不胜感激!

r pattern-matching
1个回答
0
投票

尝试

left_join
pivot_longer
:

library(tidyverse)

new.df <- left_join(
  input.df, 
  pivot_longer(season.df, everything(), names_to = "category", values_to = "keyword"),
  by = "keyword")

输出:

> new.df
   observation  keyword category
1           S1 freezing   Winter
2           S1     snow   Winter
3           S1 slippery   Winter
4           S2 slippery   Winter
5           S2     snow   Winter
6           S2 freezing   Winter
7           S3     snow   Winter
8           S3 freezing   Winter
9           S4      sun   Summer
10          S4      hot   Summer
11          S4    airco   Summer
© www.soinside.com 2019 - 2024. All rights reserved.