我有一个显示人的“活动链”的df,看起来像这样(问题底部的片段):
head(agents)
id leg_activity
1 9 home, adpt, shop, car_passenger, home, adpt, work, adpt, home, work, outside, pt, home
2 10 home, pt, outside, pt, home, car, leisure, car, other, car, leisure, car, leisure, car, other, car, leisure, car, other, car, leisure, car, home, adpt, leisure, adpt, home
3 11 home, work, adpt, home
4 96 home, car, work, car, home, work, adpt, home
5 97 home, adpt, work, car_passenger, leisure, car_passenger, work, adpt, home, car_passenger, outside, car_passenger, outside, car_passenger, home
6 101 home, bike, outside, car_passenger, outside, car_passenger, outside, bike, home, adpt, leisure, adpt, home, bike, leisure, bike, home
我感兴趣的是检测adpt
发生的模式。最简单的方法是使用count()
函数,该函数给了我一个频率表作为输出。不幸的是,这种结果将产生误导。
外观如下:
x freq
home, adpt, work, adpt, home 2071
home, adpt, shop, adpt, home 653
home, adpt, education, adpt, home 545
home, pt, work, adpt, home 492
home, adpt, work, pt, home 468
home, adpt, work, home 283
这种方法的问题是,我无法在更长的活动链中检测到模式;例如:
home, adpt, education, adpt, education, adpt, home, car, work, car, home, shop, adpt, home
此案例在开始时就有一个活动链,该活动链非常频繁,但是随着后续活动的进行,它不会计入count
函数。
是否有一种使用count函数的方法,该方法也考虑了单元格内部发生了什么?因此,有一个表格可以显示所有可能的组合及其频率,这将很有趣,例如:
x freq
home, adpt, home 10
home, adpt, home, pt, work, home 4
home, pt, work, home 2
非常感谢您的帮助!
数据:
structure(list(id = c(9L, 10L, 11L, 96L, 97L, 101L, 103L, 248L,
499L, 1044L, 1215L, 1238L, 1458L, 1569L, 1615L, 1626L, 1734L,
1735L, 1790L, 1912L, 9040L, 14858L, 14859L, 14967L, 15011L, 15012L,
15015L, 15045L, 15050L, 15058L, 15060L, 15086L, 15088L, 15094L,
15109L, 15113L, 15152L, 15157L, 15192L, 15193L, 15222L, 15230L,
15231L, 15234L, 15235L, 15237L, 15256L, 15257L, 15258L, 15269L,
15270L, 15318L, 15319L, 15338L, 15369L, 15371L, 15396L, 15397L,
15399L, 15404L, 15505L, 15506L, 15515L, 15516L, 15525L, 15542L,
15593L, 15602L, 15608L, 15643L, 15667L, 15727L, 15728L, 15729L,
15752L, 15775L, 15808L, 15851L, 15869L, 15881L, 15882L, 15960L,
15962L, 15966L, 16058L, 16107L, 16174L, 16229L, 16237L, 16238L,
16291L, 16333L, 16416L, 16418L, 16449L, 16450L, 16451L, 16491L,
16506L, 16508L), leg_activity = c("home, adpt, shop, car_passenger, home, adpt, work, adpt, home, work, outside, pt, home",
"home, pt, outside, pt, home, car, leisure, car, other, car, leisure, car, leisure, car, other, car, leisure, car, other, car, leisure, car, home, adpt, leisure, adpt, home",
"home, work, adpt, home", "home, car, work, car, home, work, adpt, home",
"home, adpt, work, car_passenger, leisure, car_passenger, work, adpt, home, car_passenger, outside, car_passenger, outside, car_passenger, home",
"home, bike, outside, car_passenger, outside, car_passenger, outside, bike, home, adpt, leisure, adpt, home, bike, leisure, bike, home",
"home, adpt, work, adpt, home, walk, other, pt, home", "home, adpt, work, walk, home, adpt, work, walk, home",
"home, adpt, leisure, adpt, home, bike, outside, bike, home",
"home, pt, work, adpt, home, adpt, work, adpt, home", "home, adpt, work, adpt, home, car, outside, car, work, car, work, car, home",
"home, work, leisure, adpt, home", "home, outside, pt, home, adpt, leisure, adpt, home",
"home, car_passenger, leisure, walk, work, walk, leisure, walk, work, adpt, home, walk, home",
"home, adpt, work, walk, work, walk, work, pt, home", "home, car, work, pt, leisure, adpt, work, car, home, car, home",
"home, adpt, other, adpt, home, car, home", "home, adpt, other, adpt, home",
"home, education, walk, shop, walk, education, pt, outside, home, adpt, leisure, adpt, home",
"home, adpt, work, adpt, home, walk, home", "home, adpt, work, pt, leisure, adpt, work, adpt, work, adpt, home, adpt, other, walk, home",
"home, adpt, work, adpt, home, adpt, work, adpt, home, walk, leisure, walk, home",
"home, adpt, work, adpt, home, work, adpt, home, walk, leisure, walk, home",
"home, adpt, work, adpt, home, car_passenger, outside, car_passenger, leisure, car_passenger, home, car_passenger, home",
"home, adpt, other, adpt, home, car, work, car, home", "home, adpt, education, adpt, leisure, adpt, home, walk, leisure, walk, home",
"home, car_passenger, other, pt, home, walk, other, walk, home, car_passenger, other, walk, home, adpt, other, adpt, home",
"home, work, pt, work, adpt, work, adpt, home", "home, adpt, leisure, adpt, home, car, shop, car, other, car, home",
"home, adpt, work, adpt, home, walk, other, adpt, home", "home, adpt, work, adpt, home, car_passenger, leisure, car_passenger, home",
"home, car, other, car, home, adpt, shop, adpt, home", "home, pt, work, adpt, home",
"home, adpt, work, adpt, home", "home, adpt, work, adpt, home",
"home, walk, education, adpt, home, walk, education, walk, home, bike, leisure, bike, home",
"home, adpt, shop, adpt, home, car, home", "home, adpt, leisure, walk, leisure, walk, leisure, adpt, home",
"home, adpt, shop, pt, home, adpt, other, adpt, home", "home, adpt, other, adpt, home, car_passenger, leisure, walk, home",
"home, adpt, work, adpt, home, car_passenger, shop, car_passenger, home",
"home, adpt, other, adpt, work, adpt, home", "home, adpt, work, adpt, home, adpt, other, walk, shop, walk, home, car, outside, car, outside, car, outside, car, home",
"home, adpt, other, adpt, home", "home, adpt, education, adpt, home, adpt, education, adpt, home",
"home, pt, work, adpt, work, adpt, work, adpt, work, adpt, home, adpt, work, adpt, home",
"home, walk, other, car_passenger, education, walk, home, car_passenger, education, adpt, home",
"home, walk, shop, walk, home, walk, leisure, adpt, leisure, adpt, home",
"home, adpt, work, adpt, home, walk, shop, walk, home, walk, leisure, walk, home, walk, home",
"home, adpt, leisure, adpt, home", "home, walk, leisure, walk, home, adpt, other, adpt, shop, walk, leisure, walk, home",
"home, pt, leisure, adpt, home, pt, outside, pt, home, bike, leisure, bike, home",
"home, pt, outside, pt, home, walk, home, walk, other, adpt, shop, pt, home, car_passenger, leisure, adpt, home",
"home, adpt, work, adpt, home, adpt, shop, adpt, work, adpt, home",
"home, adpt, shop, adpt, other, walk, home", "home, walk, other, walk, home, walk, home, adpt, other, adpt, home, adpt, shop, adpt, home, car, other, car, home, adpt, other, adpt, home",
"home, adpt, leisure, pt, home", "home, leisure, adpt, home",
"home, adpt, leisure, pt, shop, walk, home, walk, shop, walk, home",
"home, car, outside, car, outside, leisure, car, outside, car, outside, car, home, adpt, other, adpt, home",
"home, adpt, work, adpt, shop, walk, home", "home, adpt, other, walk, work, adpt, home, adpt, other, adpt, work, adpt, home, adpt, leisure, adpt, home",
"home, adpt, leisure, adpt, home, car, shop, car, home", "home, walk, shop, adpt, home, car, other, car, home, adpt, other, adpt, home",
"home, walk, leisure, walk, home, adpt, work, adpt, home", "home, adpt, work, adpt, home",
"home, adpt, leisure, pt, shop, adpt, home, adpt, leisure, walk, home",
"home, walk, other, walk, leisure, walk, home, car, leisure, car, home, walk, leisure, adpt, home",
"home, adpt, work, adpt, home", "home, walk, leisure, walk, home, adpt, leisure, adpt, home, adpt, leisure, walk, home",
"home, walk, home, walk, shop, walk, home, walk, leisure, walk, home, adpt, other, adpt, home",
"home, car_passenger, outside, car_passenger, outside, car_passenger, home, adpt, other, adpt, home",
"home, walk, education, adpt, home", "home, adpt, education, walk, home, bike, education, bike, home",
"home, adpt, other, adpt, home, adpt, shop, pt, home", "home, adpt, other, adpt, shop, walk, home, adpt, leisure, car_passenger, home",
"home, adpt, work, adpt, other, adpt, home", "home, adpt, work, adpt, home",
"home, adpt, work, adpt, home, walk, home", "home, car, work, adpt, leisure, adpt, work, car, home",
"home, adpt, shop, adpt, home, car, other, car, home, car_passenger, outside, car_passenger, home",
"home, adpt, work, pt, home, car, shop, car, home", "home, walk, other, adpt, work, adpt, shop, adpt, shop, adpt, home",
"home, adpt, leisure, adpt, shop, adpt, leisure, pt, home", "home, adpt, leisure, adpt, shop, adpt, home",
"home, car, outside, car, outside, car, outside, car, outside, car, home, adpt, education, pt, home",
"home, adpt, work, adpt, home", "home, adpt, shop, adpt, home",
"home, adpt, education, adpt, home, adpt, education, adpt, home",
"home, adpt, other, adpt, other, walk, leisure, adpt, other, adpt, home",
"home, adpt, work, adpt, home", "home, adpt, work, adpt, home, car, other, car, home",
"home, car, work, car, shop, car, home, adpt, work, adpt, home, car, home",
"home, walk, other, walk, education, adpt, home, adpt, education, walk, home, walk, home",
"home, adpt, shop, walk, leisure, adpt, home", "home, adpt, shop, walk, home, adpt, work, adpt, home",
"home, adpt, leisure, adpt, shop, walk, home", "home, walk, other, adpt, shop, walk, home, walk, other, walk, home, walk, other, walk, other, adpt, home",
"home, adpt, education, walk, home, walk, education, walk, home, walk, home",
"home, bike, education, bike, home, adpt, education, adpt, home, walk, home"
)), row.names = c(NA, 100L), class = "data.frame")
我不太确定您到底想做什么,但我知道您有兴趣检测活动adpt
的发生方式。这通常是在NLP中完成的,下面是使用tidytext
包的解决方案。我将leg_activity
列拆分为n-grams
,即按连续的单词序列拆分文本。两个连续单词的序列称为bi-gram
,三个连续单词tri-gram
等等。当我们对这些n-grams
进行计数时,我们将了解哪些活动最常发生于适应,哪些活动最常发生于适应。