根据现有列的名称和字符串的存在进行变异以创建多个新列

问题描述 投票:0回答:1

我有三个问题,但它们是相关的。

我的出发点是

问题1:我想创建五个新列:#Enterprises_Sig、Revenues_Sig、Costs_Sig、Net Revenues_Sig 和 Assets_Sig。这些列在各自的列中将包含相同数量的 *。原始列将只包含数字。例如,下面的代码可以满足我的需要,但仅适用于一列。

table_2 <- table_2 %>% mutate("Net Revenues_Sig" = ifelse(str_count(table_2$`Net Revenues`,"∗")==1, "∗",ifelse(str_count(table_2$`Net Revenues`,"∗")==2, "∗∗",ifelse(str_count(table_2$`Net Revenues`,"∗")==3, "∗∗∗",""))))

table_2$`Net Revenues` <- str_replace_all(table_2$`Net Revenues`, "[∗]", "")

生产

当然我可以再重复这个过程4次,但一定有更有效的方法......?

问题2:我想做一些类似的事情,但是对于方括号。如何为不带括号的相应标准误差创建 5 个新列(例如,新列 Revenues_SE 将是数字并包含值 24346.05、16080.92、34895.03),然后删除这三行(因此 Revenues 只有三个长期、短期和一次性价值)?

问题 3:如何将除 5 Sig 列之外的所有列转换为数字(由于括号和星号,目前为字符)?

结构(列表(治疗 = c(“长期臂”,“”,“短期臂”, "", "块臂", ""),

# Enterprises
= c("9.93**", "[3.96]", "3.39", "[3.57]", "14.67****", "[3.92]"), 收入 = c("61379.40****", “[24346.05]”,“23177.47”,“[16080.92]”,“107746.75***”, "[34895.03]"), 成本 = c("32055.29*", "[16478.13]", "8497.42", “[10462.44]”,“71903.23***”,“[24360.84]”),
Net Revenues
= c("28226.05**","[12334.27]","14824.71*","[8143.69]", "35576.39*****", "[13382.81]"), 资产 = c("36050.66********", "[12589.11]", "16441.81", "[10029.27]", "29404.54*****", "[10977.68]")), row.names = 3:8,类=“data.frame”)

string rstudio numeric mutate
1个回答
0
投票

无论它的价值如何,这都能满足您的需求。您需要

library(data.table)
library(stringr)

# structure
x <- structure(
    list(
        Treatment = c("Long Term Arm", "", "Short Term Arm", "", "Lumpsum Arm", ""),
        `# Enterprises` = c("9.93∗∗", "[3.96]", "3.39", "[3.57]", "14.67∗∗∗", "[3.92]"),
        Revenues = c("61379.40∗∗", "[24346.05]", "23177.47", "[16080.92]", "107746.75∗∗∗", "[34895.03]"),
        Costs = c("32055.29∗", "[16478.13]", "8497.42", "[10462.44]", "71903.23∗∗∗", "[24360.84]"),
        `Net Revenues` = c("28226.05∗∗", "[12334.27]", "14824.71∗", "[8143.69]", "35576.39∗∗∗", "[13382.81]"),
        Assets = c("36050.66∗∗∗", "[12589.11]", "16441.81", "[10029.27]", "29404.54∗∗∗", "[10977.68]")
    ),
    row.names = 3:8, class = "data.frame"
)

# turn to dt
x <- data.table::as.data.table(x)

# get columns to change
to_change <- colnames(x)[colnames(x) != "Treatment"]

# add on Sig to each column
to_change_sig <- paste0(to_change, " Sig")

# first function - extract all ∗ and collapse
fun <- function(y) {
    z <- stringr::str_extract_all(y, "∗")
    lapply(z, function(x) paste0(x, collapse = "")) |> unlist()
}

# extracts the stars
x[, (to_change_sig) := lapply(.SD, fun), .SDcols = to_change]

# second function, replace brackets and * with nothing, turn to numeric
fun <- function(y) {
    z <- stringr::str_remove_all(y, "\\[|\\]|∗")
    as.numeric(z)
}

# removes unwanted, turns to numeric
x[, (to_change) := lapply(.SD, fun), .SDcols = to_change]

输出

r$> head(x)
        Treatment # Enterprises  Revenues    Costs Net Revenues   Assets # Enterprises Sig
           <char>         <num>     <num>    <num>        <num>    <num>            <char>
1:  Long Term Arm          9.93  61379.40 32055.29     28226.05 36050.66                ∗∗
2:                         3.96  24346.05 16478.13     12334.27 12589.11                  
3: Short Term Arm          3.39  23177.47  8497.42     14824.71 16441.81                  
4:                         3.57  16080.92 10462.44      8143.69 10029.27                  
5:    Lumpsum Arm         14.67 107746.75 71903.23     35576.39 29404.54               ∗∗∗
6:                         3.92  34895.03 24360.84     13382.81 10977.68                  
   Revenues Sig Costs Sig Net Revenues Sig Assets Sig
         <char>    <char>           <char>     <char>
1:           ∗∗         ∗               ∗∗        ∗∗∗
2:                                                   
3:                                       ∗           
4:                                                   
5:          ∗∗∗       ∗∗∗              ∗∗∗        ∗∗∗
6:                                                   

如果你想在 tidyverse 中完成这一切,你可能会考虑

dplyr::mutate_all()
/
dplyr::mutate_across()
,但对于这类事情,我个人喜欢在
.SDcols
中使用
data.table
语法。

看起来您正在处理某种转换为 data.frame 的模型输出 - 如果这是真的,您可能想看看是否可以直接从模型中提取相同的信息而不是这种方法。

无论如何,希望这有帮助! :)

© www.soinside.com 2019 - 2024. All rights reserved.