我的数据看起来像这样:
A toberevised
8: <NA>
9: <NA>
10: Number of returns
11: Number of joint returns
12: Number with paid preparer's signature
13: Number of exemptions
14: Adjusted gross income (AGI) [3]
14: Adjusted gross income (AGI) [3]
**15: Salaries and wages in AGI: [4] Number
16: Amount
17: Taxable interest: Number
18: Amount
19: Ordinary dividends: Number
20: Amount**
21: <NA>
22: <NA>
23: Number of returns
24: Number of joint returns
25: Number with paid preparer's signature
26: Number of exemptions
DF <- structure(list(toberevised = c("[Money amounts are in thousands of dollars]",
NA, NA, NA, "Item", NA, NA, NA, NA, "Number of returns", "Number of joint returns",
"Number with paid preparer's signature", "Number of exemptions",
"Adjusted gross income (AGI) [3]", "Salaries and wages in AGI: [4] Number",
"Amount", "Taxable interest: Number", "Amount", "Ordinary dividends: Number",
"Amount")), row.names = c(NA, -20L), class = c("data.table",
"data.frame"))
我想写一段代码,将第15、17和19行中的:
之前的部分复制到其他行中Amount
的前面,所以:
A toberevised
8: <NA>
9: <NA>
10: Number of returns
11: Number of joint returns
12: Number with paid preparer's signature
13: Number of exemptions
14: Adjusted gross income (AGI) [3]
**15: Salaries and wages in AGI: [4] Number
16: Salaries and wages in AGI: Amount
17: Taxable interest: Number
18: Taxable interest: Amount
19: Ordinary dividends: Number
20: Ordinary dividends: Amount**
21: <NA>
22: <NA>
23: Number of returns
24: Number of joint returns
25: Number with paid preparer's signature
26: Number of exemptions
[我尝试了一些非常笨拙的解决方案,例如将具有:
的单元格复制到新列,填充该列,然后尝试从该列中删除Number
,之后我可以将这些列连接起来,之后必须删除所有碎片。
DF <- setDT(DF)[grepl(":", DF$toberevised), type:=toberevised]
DF$type <- na.locf(DF$type, na.rm=FALSE)
DF$type <- gsub("[[:punct:]]*Number[[:punct:]]*", "", DF$type)
DF$fullname <- paste(DF$type,DF$toberevised)
除了它不起作用的事实之外,它还很麻烦。
什么是更好的方法?我在考虑一种检查是否一个单元格具有: Number
以及下面的单元格具有Amount
的子字符串,该子字符串在:
之前粘贴在下面的字符串之前。但是我不知道该怎么写..
您可以做:
一种可能的解决方案