删除字符串中的重复字符

问题描述 投票:4回答:3

这个问题可能与这个question有关。

不幸的是,那里给出的解决方案不适用于我的数据。

我有以下矢量示例:

example<-c("ChildrenChildren", "Clothing and shoesClothing and shoes","Education, health and beautyEducation, health and beauty", "Leisure activities, travelingLeisure activities, traveling","LoansLoans","Loans and financial servicesLoans and financial services" ,"Personal transfersPersonal transfers" ,"Savings and investmentsSavings and investments","TransportationTransportation","Utility servicesUtility services")

我当然想要相同的字符串而不重复,即:

  > result
 [1]   "Children" "Clothing and shoes" "Education, health and beauty"

那可能吗?

r regex
3个回答
10
投票

您可以使用sub,直接在pattern部分捕获您想要的位:

sub("(.+)\\1", "\\1", example)
 #[1] "Children"                      "Clothing and shoes"            "Education, health and beauty"  "Leisure activities, traveling" "Loans"                        
 #[6] "Loans and financial services"  "Personal transfers"            "Savings and investments"       "Transportation"                "Utility services"

(.+)允许捕获一些模式,\\1显示你刚捕获的内容,所以你想要找到的是“任何两次”,然后用相同的“任何东西”替换,但只需一次。


5
投票

如果重复所有字符串,那么它们的长度是它们所需的两倍,所以取每个字符串的前半部分:

> substr(example, 1, nchar(example)/2)
 [1] "Children"                      "Clothing and shoes"           
 [3] "Education, health and beauty"  "Leisure activities, traveling"
 [5] "Loans"                         "Loans and financial services" 
 [7] "Personal transfers"            "Savings and investments"      
 [9] "Transportation"                "Utility services"             

3
投票

我们可以尝试:

stringr::str_remove_all(example,"[a-z].*[A-Z]")

结果:

[1] "Children"                      "Clothing and shoes"            "Education, health and beauty" 
 [4] "Leisure activities, traveling" "Loans"                         "Loans and financial services" 
 [7] "Personal transfers"            "Savings and investments"       "Transportation"               
[10] "Utility services"  
© www.soinside.com 2019 - 2024. All rights reserved.