删除字符串中的重复字符

Question

这个问题可能与这个question有关。

不幸的是，那里给出的解决方案不适用于我的数据。

我有以下矢量示例：

example<-c("ChildrenChildren", "Clothing and shoesClothing and shoes","Education, health and beautyEducation, health and beauty", "Leisure activities, travelingLeisure activities, traveling","LoansLoans","Loans and financial servicesLoans and financial services" ,"Personal transfersPersonal transfers" ,"Savings and investmentsSavings and investments","TransportationTransportation","Utility servicesUtility services")

我当然想要相同的字符串而不重复，即：

  > result
 [1]   "Children" "Clothing and shoes" "Education, health and beauty"

那可能吗？

Answer 1

您可以使用sub，直接在pattern部分捕获您想要的位：

sub("(.+)\\1", "\\1", example)
 #[1] "Children"                      "Clothing and shoes"            "Education, health and beauty"  "Leisure activities, traveling" "Loans"                        
 #[6] "Loans and financial services"  "Personal transfers"            "Savings and investments"       "Transportation"                "Utility services"

(.+)允许捕获一些模式，\\1显示你刚捕获的内容，所以你想要找到的是“任何两次”，然后用相同的“任何东西”替换，但只需一次。

Answer 2

如果重复所有字符串，那么它们的长度是它们所需的两倍，所以取每个字符串的前半部分：

> substr(example, 1, nchar(example)/2)
 [1] "Children"                      "Clothing and shoes"           
 [3] "Education, health and beauty"  "Leisure activities, traveling"
 [5] "Loans"                         "Loans and financial services" 
 [7] "Personal transfers"            "Savings and investments"      
 [9] "Transportation"                "Utility services"

Answer 3

我们可以尝试：

stringr::str_remove_all(example,"[a-z].*[A-Z]")

结果：

[1] "Children"                      "Clothing and shoes"            "Education, health and beauty" 
 [4] "Leisure activities, traveling" "Loans"                         "Loans and financial services" 
 [7] "Personal transfers"            "Savings and investments"       "Transportation"               
[10] "Utility services"

删除字符串中的重复字符

问题描述投票：4回答：3

3个回答

最新问题

删除字符串中的重复字符

问题描述 投票：4回答：3

3个回答

最新问题

问题描述投票：4回答：3