使用文本挖掘规则将行删除到一列中

问题描述 投票:0回答:1

我的实验数据有这样的格式:

df  <-data.frame(product_path = c("https://mycommerece.com/product/book/miracle", "https://mycommerece.com/product/book/miracle2", "https://mycommerece.com/product/gadget/airplane", "https://mycommerece.com/product/book/miracle3"), var1 = c(1,1,1,0), commereceurl = c("https://mycommerece.com/product/","https://mycommerece.com/product/","https://mycommerece.com/product2/","https://www.test.com"), var2 = c(1,0,0,1))
    > df
                                         product_path var1                      commereceurl var2
    1    https://mycommerece.com/product/book/miracle    1  https://mycommerece.com/product/    1
    2   https://mycommerece.com/product/book/miracle2    1  https://mycommerece.com/product/    0
    3 https://mycommerece.com/product/gadget/airplane    1 https://mycommerece.com/product2/    0
    4   https://mycommerece.com/product/book/miracle3    0              https://www.test.com    1

使用列commereceurl中的数据我想删除特定行中的值不以这个“https://mycommerece.com”开头的行

输出示例

df  <-data.frame(product_path = c("https://mycommerece.com/product/book/miracle", "https://mycommerece.com/product/book/miracle2", "https://mycommerece.com/product/gadget/airplane"), var1 = c(1,1,1), commereceurl = c("https://mycommerece.com/product/","https://mycommerece.com/product/","https://mycommerece.com/product2/"), var2 = c(1,0,0))

如何实施此规则?

r
1个回答
3
投票

您可以使用grep识别所需的行

KEEP = grep("^https://mycommerece.com", df$commereceurl)
df = df[KEEP,]
© www.soinside.com 2019 - 2024. All rights reserved.