替换多列值

问题描述 投票:0回答:5

我有一个包含 25 列的数据表“data”。在某些列(大约 15 列)中,包含数值(但在导入后定义为字符),我想替换某些字符,例如“,”乘“.”,“<" by "", ">”乘“”等(可以是10个或更多组合),因为有些值是这样的“<0,17" or "> 1,5”。

随着列名的更改(因为它影响不同的数据表),我想用这种方式解决它(我编码的内容不正确,只是为了显示我想要做什么)。

replace <- list ("," = ".", "<" = "", ">" = "")
affectedColumns = c("name1", "name2", "name3" ... "name 14", "name 15").

mydata %>%
  mutate(affectedColumns, replace)

另一个问题是,有些列是数字,有些是字符。首先将“affectedColumns”中的所有值转换为字符(as.character)>然后进行替换过程,然后将它们全部转换回数字(as.numeric)是否有意义?

最后我想要带有“.”的值。作为逗号并且没有任何“<" or ">”或空格。

有办法做到这一点吗? 谢谢!

r replace
5个回答
0
投票

这是基本的 R 方式。

mydata[affectedColumns] <- lapply(mydata[affectedColumns], \(x){
  for(nm in names(replace)) x <- sub(nm, replace[nm], x)
  as.numeric(x)
})

0
投票

您可以使用

parse_number
包中的
readr
转换为数字,同时删除大于/小于符号。

library(readr)

df <- data.frame("name1" = c("1,5", "> 1,5", "<1,6"), 
                 "name2" = c("1,5", "1,5", "1,5"), 
                 "name3" = c("1,0", "1", "1"),
                 "name4" = c(1.5, 1, 0.5)
                 )

affectedColumns <- c("name1", "name2", "name3")

new_df <- mutate(df, across(affectedColumns, .fns = ~parse_number(.x, locale = locale(decimal_mark = ","))))

0
投票

这是一个

dplyr
解决方案:

library(dplyr)
mydata %>%
  # Step 1: remove < and >:
  mutate(across(c(everything()), 
                ~ sub("\\s?(>|<)", "", .))) %>%
  # Step 2: replace dot by comma:
  mutate(across(c(everything()), 
                ~ sub("\\.", ",", .))) 
  col1   col2
1  1,2 12,701
2    3  55,77
3    5   5000

编辑

这是使用

setNames
stringr
的解决方案:

首先定义新值和旧值集(确保转义正则表达式元字符,例如

.
):

replacements <- setNames(c("", "", ","),     # new values
                         c("<", ">", "\\.")) # old values

或者,更经济一点:

replacements <- setNames(c("", ","),      # new values
                         c("<|>", "\\.")) # old values

现在使用

str_replace_all
一次性实施更改:

library(stringr)
mydata %>%
  mutate(across(c(col1:col2), 
                ~ str_replace_all(., replacements)))

玩具数据:

mydata <- data.frame(
  col1 = c("1.2", "3", "<5"),
  col2 = c(">12.701", "55,77", "< 5000")
)

0
投票
structure(list(D = c(12327, 12328, 12329, 12330, 12331, 12333, 
12334, 12335, 12336, 12337, 12338, 12339, 12340, 12343, 12345, 
12348, 12349, 12350, 12351, 12352), E = c(12310, 12310, 12326, 
12326, 12315, 12326, 0, 12324, 12324, 12334, 12334, 0, 12339, 
0, 0, 12345, 12345, 0, 12343, 12343), Basiswert = c("AUDCAD", 
"AUDCAD", "USDJPY", "USDJPY", "USDCAD", "USDJPY", "USDCHF", "USDCHF", 
"USDCHF", "USDCHF", "USDCHF", "USDCAD", NA, "USDCAD", "CADJPY", 
"CADJPY", "CADJPY", "USDCHF", "USDCAD", "USDCAD"), Einstieg = c(NA, 
0.89262, NA, 139.192, NA, NA, 0.9052, NA, 0.90834, NA, 0.90816, 
NA, NA, 1.362, 103.188, NA, 102.886, 0.9051, NA, 1.36504), Profit = c(33, 
NA, 34, NA, 68, 68, NA, 33, NA, 33, NA, NA, NA, NA, NA, 34, NA, 
NA, 33, NA), SL = c(NA, NA, NA, NA, NA, NA, 0.91134, NA, NA, 
NA, NA, NA, NA, 1.3684, 102.545, NA, NA, 0.91138, NA, NA), TP = c(NA, 
NA, NA, NA, NA, NA, 0.89325, NA, NA, NA, NA, NA, NA, 1.3504, 
104.35, NA, NA, 0.8933, NA, NA), Trader = c(NA, NA, NA, NA, NA, 
NA, "Trade by Jason\" ", NA, NA, NA, NA, NA, NA, "Trade by Jason\" ", 
"Trade by Jason\" ", NA, NA, "Trade by Jason\" ", NA, NA)), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -20L), groups = structure(list(
    E = c(0, 12310, 12315, 12324, 12326, 12334, 12339, 12343, 
    12345), .rows = structure(list(c(7L, 12L, 14L, 15L, 18L), 
        1:2, 5L, 8:9, c(3L, 4L, 6L), 10:11, 13L, 19:20, 16:17), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -9L), .drop = TRUE))

非常感谢您的努力和解决方案。然而,我并没有研究整个数据集。请参阅上面的示例。


-1
投票

考虑将

mutate
across
case_when
函数组合起来形成
dplyr
包。您可以在这里找到它们:https://dplyr.tidyverse.org/reference/across.html和这里:https://dplyr.tidyverse.org/reference/case_when.html或给出一个最小的可重现示例。

最好的, M.

© www.soinside.com 2019 - 2024. All rights reserved.