我有一个数据帧,其中包含一个名为charge
的列,该列包含一个特征向量,以及一个名为n
的列,该列是数字矢量。以下数据是我所拥有的示例
charge<-c('unlawful possession of a firearm',
'unlawful possession of a firearm repealed: 12-31-2016',
'accessory unlawful possession of a firearm',
'unlawful possession of drug paraphernalia',
'unlawful possession of drug paraphernalia - prior drug offense',
'579.074579.074579.074unlawful possession of drug paraphernalia')
n<-c(3904,4,2,2500,4,11)
df<-data.frame(charge,n)
charge n
1 unlawful possession of a firearm 3904
2 unlawful possession of a firearm repealed: 12-31-2016 4
3 accessory unlawful possession of a firearm 2
4 unlawful possession of drug paraphernalia 2500
5 unlawful possession of drug paraphernalia - prior drug offense 4
6 579.074579.074579.074unlawful possession of drug paraphernalia 11
如您所见,字符向量具有一堆收费代码,其中包含两个常见短语非法拥有枪支和非法拥有吸毒用具]]。但是,我想将它们归类为常用短语,以便使其看起来像以下内容。我该怎么做呢?
charge n
1 unlawful possession of a firearm 3910
2 unlawful possession of drug paraphernalia 2515
我有一个数据框,其中包含一个称为charge的列,其中包含一个chracter向量,以及一个称为n的列,n为数字向量。以下数据是我所负责的示例
我们可以用str_extract
提取字符串的一部分,将其用作分组变量并获得'n'的sum
。这里使用的模式是“非法”一词,后跟空格和其他字符,直到我们匹配“枪支”或“毒品用具”]