这是我正在处理的文本斑点的结构:
reprEx <- "] WITHDRAWALS\nDATE DESCRIPTION AMOUNT\n04/01 Quickpay With Zelle Payment To Mike T 819018100 $1,450.00\n04/01 Quickpay With Zelle Payment To Mandy Doid 809012906 2,665.00"
我希望能够在每个新行上获取文本,并将行中的每个元素分隔到相应的数据框列。例如,我需要将每一行的日期放在DATE列中,将事务的描述放在DESCRIPTION列中,并将行尾之前的数字放入AMOUNT列。这是我在数据框中所需输出的示例。
desiredResult <- data.frame(DATE = c("04/01", "04/01"),
DESCRIPTION = c("Quickpay With Zelle Payment To Mike T 819018100", "Quickpay With Zelle Payment To Mandy Doid 819012906"),
AMOUNT = c("$1,450.00", "2,665.00"))
一开始如何?此解决方案使用str_extract_all
包中的stringr
:
library(stringr)
desiredResult <- data.frame(
DATE = unlist(str_extract_all(reprEx, "[0-9]{2}/[0-9]{2}")),
DESCRIPTION = unlist(str_extract_all(reprEx, "(?<=[0-9]{2}/[0-9]{2}\\s)[\\s\\w$]+(?=\\d{1,3},\\d{3}\\.\\d{2})")),
AMOUNT = unlist(str_extract_all(reprEx, "\\d{1,3},\\d{3}\\.\\d{2}"))
)
输出:
desiredResult
DATE DESCRIPTION AMOUNT
1 04/01 Quickpay With Zelle Payment To Mike T 8090128100 $ 1,450.00
2 04/01 Quickpay With Zelle Payment To Mandy Dold 8090129906 2,665.00