将行折叠到每个集合中最低的完整行

问题描述 投票:0回答:1

我正在清理一个巨大的数据集,它来自于在PDF上使用qazxsw poi。

这些列被正确描绘,但是我有很多行,其中原始的一个单元格是巨大的,而tabulizer()将其读取为多行,除了大的单元格外,所有单元格都是空白的。我需要折叠数据框,以便将行“向下”折叠到最低的完整行。

以下是数据外观的示例:tabulizer

第一次看到,这些“额外行”出现的列因行而异(在一种情况下,它是enter image description here,在其他情况下是species。我想将它们折叠成完整行,这样第1行保持不变,第2行是实际上行2:6折叠,第7行完好无损等等。我甚至不知道R是否是最适合使用的工具,但如果有area.of.operation解决方案,我会很高兴。示例数据框如下。

先感谢您。

dplyr
r dplyr data-cleaning
1个回答
0
投票

旧的qazxsw poi-qazxsw poi-ting stratagem将每个栏目粘贴在崩溃=“,”然后 mydata <- structure(list(X = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 11L, 12L, 13L, 17L), target.species = structure(c(4L, 1L, 1L, 1L, 1L, 5L, 4L, 1L, 1L, 2L, 3L), .Label = c("", "hake", "hake, southern", "rosefish", "squid, cuttlefish,"), class = "factor"), gear = structure(c(2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 3L, 2L), .Label = c("", "trawl, bottom", "trawl, midwater"), class = "factor"), number.boats = structure(c(2L, 1L, 1L, 1L, 1L, 3L, 5L, 1L, 1L, 4L, 4L), .Label = c("", "18 vessels", "98 refrigerated high", "none provided", "seas vessels"), class = "factor"), company = structure(c(2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L), .Label = c("", "not applicable"), class = "factor"), area.of.operation = structure(c(2L, 1L, 1L, 1L, 3L, 4L, 2L, 3L, 4L, 2L, 5L), .Label = c("", "above provinces", "annual fishery; EEZ", "concentrated around", "deepwater coastal"), class = "factor"), species = structure(c(6L, 3L, 4L, 5L, 9L, 8L, 7L, 9L, 8L, 1L, 2L), .Label = c("Fur seal", "none provided", "otter", "otter, river", "porpoise", "seal", "Seal", "South American Sea lion,", "spectacled porpoise,"), class = "factor"), estimates = structure(c(2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L), .Label = c("", "none" ), class = "factor")), class = "data.frame", row.names = c(NA, -11L)) -out the the extra commas to get you most way:

cumsum
© www.soinside.com 2019 - 2024. All rights reserved.