如何使用R检测字符列中的模式和频率?

问题描述 投票:1回答:1

我有一个显示人的“活动链”的df,看起来像这样(问题底部的片段):

head(agents)
   id                                                                                                                                                                leg_activity
1   9                                                                                      home, adpt, shop, car_passenger, home, adpt, work, adpt, home, work, outside, pt, home
2  10 home, pt, outside, pt, home, car, leisure, car, other, car, leisure, car, leisure, car, other, car, leisure, car, other, car, leisure, car, home, adpt, leisure, adpt, home
3  11                                                                                                                                                      home, work, adpt, home
4  96                                                                                                                                home, car, work, car, home, work, adpt, home
5  97                              home, adpt, work, car_passenger, leisure, car_passenger, work, adpt, home, car_passenger, outside, car_passenger, outside, car_passenger, home
6 101                                       home, bike, outside, car_passenger, outside, car_passenger, outside, bike, home, adpt, leisure, adpt, home, bike, leisure, bike, home

我感兴趣的是检测adpt发生的模式。最简单的方法是使用count()函数,该函数给了我一个频率表作为输出。不幸的是,这种结果将产生误导。

外观如下:

x                                 freq
home, adpt, work, adpt, home      2071
home, adpt, shop, adpt, home      653
home, adpt, education, adpt, home 545
home, pt, work, adpt, home        492
home, adpt, work, pt, home        468
home, adpt, work, home            283

这种方法的问题是,我无法在更长的活动链中检测到模式;例如:

 home, adpt, education, adpt, education, adpt, home, car, work, car, home, shop, adpt, home

此案例在开始时就有一个活动链,该活动链非常频繁,但是随着后续活动的进行,它不会计入count函数。

是否有一种使用count函数的方法,该方法也考虑了单元格内部发生了什么?因此,有一个表格可以显示所有可能的组合及其频率,这将很有趣,例如:

x                                freq
home, adpt, home                 10
home, adpt, home, pt, work, home 4
home, pt, work, home             2

非常感谢您的帮助!

数据:

structure(list(id = c(9L, 10L, 11L, 96L, 97L, 101L, 103L, 248L, 
499L, 1044L, 1215L, 1238L, 1458L, 1569L, 1615L, 1626L, 1734L, 
1735L, 1790L, 1912L, 9040L, 14858L, 14859L, 14967L, 15011L, 15012L, 
15015L, 15045L, 15050L, 15058L, 15060L, 15086L, 15088L, 15094L, 
15109L, 15113L, 15152L, 15157L, 15192L, 15193L, 15222L, 15230L, 
15231L, 15234L, 15235L, 15237L, 15256L, 15257L, 15258L, 15269L, 
15270L, 15318L, 15319L, 15338L, 15369L, 15371L, 15396L, 15397L, 
15399L, 15404L, 15505L, 15506L, 15515L, 15516L, 15525L, 15542L, 
15593L, 15602L, 15608L, 15643L, 15667L, 15727L, 15728L, 15729L, 
15752L, 15775L, 15808L, 15851L, 15869L, 15881L, 15882L, 15960L, 
15962L, 15966L, 16058L, 16107L, 16174L, 16229L, 16237L, 16238L, 
16291L, 16333L, 16416L, 16418L, 16449L, 16450L, 16451L, 16491L, 
16506L, 16508L), leg_activity = c("home, adpt, shop, car_passenger, home, adpt, work, adpt, home, work, outside, pt, home", 
"home, pt, outside, pt, home, car, leisure, car, other, car, leisure, car, leisure, car, other, car, leisure, car, other, car, leisure, car, home, adpt, leisure, adpt, home", 
"home, work, adpt, home", "home, car, work, car, home, work, adpt, home", 
"home, adpt, work, car_passenger, leisure, car_passenger, work, adpt, home, car_passenger, outside, car_passenger, outside, car_passenger, home", 
"home, bike, outside, car_passenger, outside, car_passenger, outside, bike, home, adpt, leisure, adpt, home, bike, leisure, bike, home", 
"home, adpt, work, adpt, home, walk, other, pt, home", "home, adpt, work, walk, home, adpt, work, walk, home", 
"home, adpt, leisure, adpt, home, bike, outside, bike, home", 
"home, pt, work, adpt, home, adpt, work, adpt, home", "home, adpt, work, adpt, home, car, outside, car, work, car, work, car, home", 
"home, work, leisure, adpt, home", "home, outside, pt, home, adpt, leisure, adpt, home", 
"home, car_passenger, leisure, walk, work, walk, leisure, walk, work, adpt, home, walk, home", 
"home, adpt, work, walk, work, walk, work, pt, home", "home, car, work, pt, leisure, adpt, work, car, home, car, home", 
"home, adpt, other, adpt, home, car, home", "home, adpt, other, adpt, home", 
"home, education, walk, shop, walk, education, pt, outside, home, adpt, leisure, adpt, home", 
"home, adpt, work, adpt, home, walk, home", "home, adpt, work, pt, leisure, adpt, work, adpt, work, adpt, home, adpt, other, walk, home", 
"home, adpt, work, adpt, home, adpt, work, adpt, home, walk, leisure, walk, home", 
"home, adpt, work, adpt, home, work, adpt, home, walk, leisure, walk, home", 
"home, adpt, work, adpt, home, car_passenger, outside, car_passenger, leisure, car_passenger, home, car_passenger, home", 
"home, adpt, other, adpt, home, car, work, car, home", "home, adpt, education, adpt, leisure, adpt, home, walk, leisure, walk, home", 
"home, car_passenger, other, pt, home, walk, other, walk, home, car_passenger, other, walk, home, adpt, other, adpt, home", 
"home, work, pt, work, adpt, work, adpt, home", "home, adpt, leisure, adpt, home, car, shop, car, other, car, home", 
"home, adpt, work, adpt, home, walk, other, adpt, home", "home, adpt, work, adpt, home, car_passenger, leisure, car_passenger, home", 
"home, car, other, car, home, adpt, shop, adpt, home", "home, pt, work, adpt, home", 
"home, adpt, work, adpt, home", "home, adpt, work, adpt, home", 
"home, walk, education, adpt, home, walk, education, walk, home, bike, leisure, bike, home", 
"home, adpt, shop, adpt, home, car, home", "home, adpt, leisure, walk, leisure, walk, leisure, adpt, home", 
"home, adpt, shop, pt, home, adpt, other, adpt, home", "home, adpt, other, adpt, home, car_passenger, leisure, walk, home", 
"home, adpt, work, adpt, home, car_passenger, shop, car_passenger, home", 
"home, adpt, other, adpt, work, adpt, home", "home, adpt, work, adpt, home, adpt, other, walk, shop, walk, home, car, outside, car, outside, car, outside, car, home", 
"home, adpt, other, adpt, home", "home, adpt, education, adpt, home, adpt, education, adpt, home", 
"home, pt, work, adpt, work, adpt, work, adpt, work, adpt, home, adpt, work, adpt, home", 
"home, walk, other, car_passenger, education, walk, home, car_passenger, education, adpt, home", 
"home, walk, shop, walk, home, walk, leisure, adpt, leisure, adpt, home", 
"home, adpt, work, adpt, home, walk, shop, walk, home, walk, leisure, walk, home, walk, home", 
"home, adpt, leisure, adpt, home", "home, walk, leisure, walk, home, adpt, other, adpt, shop, walk, leisure, walk, home", 
"home, pt, leisure, adpt, home, pt, outside, pt, home, bike, leisure, bike, home", 
"home, pt, outside, pt, home, walk, home, walk, other, adpt, shop, pt, home, car_passenger, leisure, adpt, home", 
"home, adpt, work, adpt, home, adpt, shop, adpt, work, adpt, home", 
"home, adpt, shop, adpt, other, walk, home", "home, walk, other, walk, home, walk, home, adpt, other, adpt, home, adpt, shop, adpt, home, car, other, car, home, adpt, other, adpt, home", 
"home, adpt, leisure, pt, home", "home, leisure, adpt, home", 
"home, adpt, leisure, pt, shop, walk, home, walk, shop, walk, home", 
"home, car, outside, car, outside, leisure, car, outside, car, outside, car, home, adpt, other, adpt, home", 
"home, adpt, work, adpt, shop, walk, home", "home, adpt, other, walk, work, adpt, home, adpt, other, adpt, work, adpt, home, adpt, leisure, adpt, home", 
"home, adpt, leisure, adpt, home, car, shop, car, home", "home, walk, shop, adpt, home, car, other, car, home, adpt, other, adpt, home", 
"home, walk, leisure, walk, home, adpt, work, adpt, home", "home, adpt, work, adpt, home", 
"home, adpt, leisure, pt, shop, adpt, home, adpt, leisure, walk, home", 
"home, walk, other, walk, leisure, walk, home, car, leisure, car, home, walk, leisure, adpt, home", 
"home, adpt, work, adpt, home", "home, walk, leisure, walk, home, adpt, leisure, adpt, home, adpt, leisure, walk, home", 
"home, walk, home, walk, shop, walk, home, walk, leisure, walk, home, adpt, other, adpt, home", 
"home, car_passenger, outside, car_passenger, outside, car_passenger, home, adpt, other, adpt, home", 
"home, walk, education, adpt, home", "home, adpt, education, walk, home, bike, education, bike, home", 
"home, adpt, other, adpt, home, adpt, shop, pt, home", "home, adpt, other, adpt, shop, walk, home, adpt, leisure, car_passenger, home", 
"home, adpt, work, adpt, other, adpt, home", "home, adpt, work, adpt, home", 
"home, adpt, work, adpt, home, walk, home", "home, car, work, adpt, leisure, adpt, work, car, home", 
"home, adpt, shop, adpt, home, car, other, car, home, car_passenger, outside, car_passenger, home", 
"home, adpt, work, pt, home, car, shop, car, home", "home, walk, other, adpt, work, adpt, shop, adpt, shop, adpt, home", 
"home, adpt, leisure, adpt, shop, adpt, leisure, pt, home", "home, adpt, leisure, adpt, shop, adpt, home", 
"home, car, outside, car, outside, car, outside, car, outside, car, home, adpt, education, pt, home", 
"home, adpt, work, adpt, home", "home, adpt, shop, adpt, home", 
"home, adpt, education, adpt, home, adpt, education, adpt, home", 
"home, adpt, other, adpt, other, walk, leisure, adpt, other, adpt, home", 
"home, adpt, work, adpt, home", "home, adpt, work, adpt, home, car, other, car, home", 
"home, car, work, car, shop, car, home, adpt, work, adpt, home, car, home", 
"home, walk, other, walk, education, adpt, home, adpt, education, walk, home, walk, home", 
"home, adpt, shop, walk, leisure, adpt, home", "home, adpt, shop, walk, home, adpt, work, adpt, home", 
"home, adpt, leisure, adpt, shop, walk, home", "home, walk, other, adpt, shop, walk, home, walk, other, walk, home, walk, other, walk, other, adpt, home", 
"home, adpt, education, walk, home, walk, education, walk, home, walk, home", 
"home, bike, education, bike, home, adpt, education, adpt, home, walk, home"
)), row.names = c(NA, 100L), class = "data.frame")
r dataframe count frequency
1个回答
0
投票

我不太确定您到底想做什么,但我知道您有兴趣检测活动adpt的发生方式。这通常是在NLP中完成的,下面是使用tidytext包的解决方案。我将leg_activity列拆分为n-grams,即按连续的单词序列拆分文本。两个连续单词的序列称为bi-gram,三个连续单词tri-gram等等。当我们对这些n-grams进行计数时,我们将了解哪些活动最常发生于适应,哪些活动最常发生于适应。

© www.soinside.com 2019 - 2024. All rights reserved.