如何使用一个或多个函数从数据帧的每一行中生成多行,然后将它们重新合并/合并?

问题描述 投票:0回答:1

我从一个数据帧开始,其中每一行都有一个长字符串,该字符串代表一维2D环境(我们称其为风景)。在实际情况下,这些字符串的高度大约为6个值,长度为80个值,因此在一维中字符串的长度为480个字符。在示例中,我已将其简化。每行还具有唯一的名称,该名称是每个景观的简写标识符。

我有一个函数,将每一行取下来,将字符串切成6条,然后对每条进行分析。在此示例中,该功能的核心作用是压缩条带并获得压缩长度。此函数导致6行数据帧,我需要将其与原始数据帧组合,结果是最终的数据帧,每原始1行就有6行。

library(dplyr)
library(tibble)

master_df <- tribble(~land_id, ~land_string,
                     "v1-few_bs", "aaaaaaaaaabaaaaabaaabababaabab",
                     "v2-more_bs", "aaaaaaaaaababbabbbabbababaabbb")

compress_it <- function(txt) {
  len.raw <- sum(nchar(txt))
  len.gz <- length(memCompress(txt, "g"))
  return(list("len_raw" = len.raw,
              "len_gz" = len.gz))
}

get_strip_data <- function(land_id, land_string) {
    with_spaces <- gsub("(.{5})", "\\1 ", land_string)
    chars_on_lines <- str_replace_all(with_spaces, pattern = " ", "\n")
    prob_matrix <- read.table(text = chars_on_lines, header=FALSE, sep = " ",
                              stringsAsFactors = FALSE)
    prob_matrix <- mutate(prob_matrix, 
                          land_id = land_id,
                          substr_id = 1:nrow(prob_matrix) )
    prob_matrix <- rename(prob_matrix, land_substring = V1)

    mutate(prob_matrix, new = map(land_substring, compress_it)) %>%
    unnest_wider(c(new))
}

get_strip_data(master_df$land_id[[2]], master_df$land_string[[2]]) # to test the above function

在这里,我们尝试执行伪代码/ klugecode。

首先,我创建一个空的数据框。

subchunks_df <- 
  tribble(~land_id, ~land_string, ~land_substring, ~substr_id, ~len_raw, ~len_gz,
          "", "", "", NA, NA, NA)

尝试使用for循环:

for ( i in 1:nrow(master_df) ) {
  subchunks_df[i, ] <- get_strip_data(master_df$land_id[[i]], master_df$land_string[[i]])
}

代替尝试贴图:

subchunks_df <- mapply(get_strip_data, 
                       land_id = master_df$land_id, 
                       land_string = master_df$land_string)

不。我的尝试是,大方地“关闭但不要雪茄。”

如果我可以将subchunks_df设置为正确的形状,那么我将选择right_join:

final_df <- right_join(master_df, subchunks_df, by = "land_id")

这是给定master_df通过函数的期望输出:

final_df <- 
  tribble(~land_id, ~land_string, ~land_substring, ~substr_id, ~len_raw, ~len_gz, 
          "v1-few_bs", "aaaaaaaaaabaaaaabaaabababaabab", "aaaaa", 1, 5, 11,     
          "v1-few_bs", "aaaaaaaaaabaaaaabaaabababaabab", "aaaaa", 2, 5, 11,     
          "v1-few_bs", "aaaaaaaaaabaaaaabaaabababaabab", "baaaa", 3, 5, 11,     
          "v1-few_bs", "aaaaaaaaaabaaaaabaaabababaabab", "abaaa", 4, 5, 13,     
          "v1-few_bs", "aaaaaaaaaabaaaaabaaabababaabab", "babab", 5, 5, 13,     
          "v1-few_bs", "aaaaaaaaaabaaaaabaaabababaabab", "aabab", 6, 5, 13,
          "v2-more_bs", "aaaaaaaaaababbabbbabbababaabbb", "aaaaa", 1, 5, 11,        
          "v2-more_bs", "aaaaaaaaaababbabbbabbababaabbb", "aaaaa", 2, 5, 11,        
          "v2-more_bs", "aaaaaaaaaababbabbbabbababaabbb", "babba", 3, 5, 13,        
          "v2-more_bs", "aaaaaaaaaababbabbbabbababaabbb", "bbbab", 4, 5, 13,        
          "v2-more_bs", "aaaaaaaaaababbabbbabbababaabbb", "babab", 5, 5, 13,        
          "v2-more_bs", "aaaaaaaaaababbabbbabbababaabbb", "aabbb", 6, 5, 13)

一如既往,我很欣赏dplyr和基于R的观点。我也不相信我的功能可以吸收条带并获得压缩长度。我找不到更简单的方法。但是最后一英里是真正的麻烦。

r dataframe dplyr mapply
1个回答
0
投票

地图功能是apply系列的tidyverse版本。 map_dfr函数正在使用矢量master_df $ land_id索引。像一个for循环一样考虑它。它为您提供了要查找right_join调用的数据框。

library(tidyverse)
subchunks_df <- map_dfr(seq_along(master_df$land_id), function(i){
                     get_strip_data(master_df$land_id[[i]], 
                                    master_df$land_string[[i]])})

final_df <- right_join(master_df, subchunks_df, by = "land_id")
最新问题
© www.soinside.com 2019 - 2024. All rights reserved.