R将值移动/移位到lapply中的其他列

问题描述 投票:2回答:1

项目概况:

我正在使用docxtractr库从文档提取项目中提取多个文件中的单词表并将它们转换为数据框。以下代码适用于特定表,即第6个表。

由于多个文件,我正在使用lapply迭代每个文件并完成数据帧操作任务。数据框具有额外列的原因是因为Word表中存在用于计算的隐藏值。这些表曾经与大型复杂的宏excel文件绑定。

我的守则

  library("docxtractr")
  sourcesSummary <- lapply(files, function(x){
   doc <- read_docx(x)
   kingsTbls <- docx_extract_all_tbls(doc)
   sources <- docx_extract_tbl(doc, 6, header = FALSE)
   sources <- data.frame(sources)
   # The below two lines are the issue #
   sources[9,3:4] <- sources[9,2:3]
   sources[24,3:4] <- sources[24,2:3]
   })

`

默认数据框

                          V1         V2            V3       V4
1           SOURCES OF FUNDS     AMOUNT      PER UNIT     <NA>
2            Proposed A Loan         87 $7,208,000.00  $82,851
3            Proposed B Loan                    $0.00    $   0
4            Investor Equity            $1,948,362.00  $22,395
5   Operating Partner Equity                    $0.00    $   0
6                      Other                    $0.00    $   0
7                      Other                    $0.00    $   0
8                      Other                    $0.00    $   0
9      TOTAL SOURCE OF FUNDS $9,156,362      $105,246     <NA>
10                                 <NA>          <NA>     <NA>
11             USES OF FUNDS     AMOUNT      PER UNIT     <NA>
12      Existing Mortgage(s)                       $0    $   0
13            Purchase Price               $9,011,000 $103,575
14          Origination Fees                  $54,060    $ 621
15        FM application Fee                   $7,208    $  83
16             Investor Fees                    $0.00    $   0
17             Closing Costs                    $0.00    $   0
18 Other (Yield Maintenance)                       $0    $   0
19      Capital Improvements               $39,650.00    $ 456
20            Processing Fee                $3,000.00    $  34
21               Third Party               $11,000.00    $ 126
22                     Legal               $12,500.00    $ 144
23    Repair Escrow (Funded)                    $0.00    $   0
24        TOTAL USE OF FUNDS $9,138,418      $105,039     <NA>
25                                 <NA>          <NA>     <NA>
26        CASH OUT/(CASH IN)        132       $17,944    $ 207

问题:

我所面临的问题围绕在数据框内移动/移位值。我已经在控制台中成功完成了它,但是,当我在lapply中运行相同的代码时,它无法正常执行。

我试图将第9行和第24行中的值向右移动一列。在控制台中,在特定的测试变量上,下面的代码工作正常,但是当我在所有文件上运行lapply时它不会。

问题代码

   sources[9,3:4] <- sources[9,2:3]
   sources[24,3:4] <- sources[24,2:3]

我也尝试在括号内使用drop = FALSE,但也不起作用。

当前输出与lapply

           V2       V3
24 $9,138,418 $105,039

期望的输出

请注意,第9行和第24行已更改。当我在控制台中运行我在特定数据框架上的代码时,这就是生成的结果,而lapply似乎也是问题。之后我将删除第二列和NA行。

                          V1         V2            V3       V4
1           SOURCES OF FUNDS     AMOUNT      PER UNIT     <NA>
2            Proposed A Loan         87 $7,208,000.00  $82,851
3            Proposed B Loan                    $0.00    $   0
4            Investor Equity            $1,948,362.00  $22,395
5   Operating Partner Equity                    $0.00    $   0
6                      Other                    $0.00    $   0
7                      Other                    $0.00    $   0
8                      Other                    $0.00    $   0
9      TOTAL SOURCE OF FUNDS               $9,156,362 $105,246
10                                 <NA>          <NA>     <NA>
11             USES OF FUNDS     AMOUNT      PER UNIT     <NA>
12      Existing Mortgage(s)                       $0    $   0
13            Purchase Price               $9,011,000 $103,575
14          Origination Fees                  $54,060    $ 621
15        FM application Fee                   $7,208    $  83
16             Investor Fees                    $0.00    $   0
17             Closing Costs                    $0.00    $   0
18 Other (Yield Maintenance)                       $0    $   0
19      Capital Improvements               $39,650.00    $ 456
20            Processing Fee                $3,000.00    $  34
21               Third Party               $11,000.00    $ 126
22                     Legal               $12,500.00    $ 144
23    Repair Escrow (Funded)                    $0.00    $   0
24        TOTAL USE OF FUNDS               $9,138,418 $105,039
25                                 <NA>          <NA>     <NA>
26        CASH OUT/(CASH IN)        132       $17,944    $ 207

提前感谢您的意见!

r dataframe data-manipulation
1个回答
1
投票

函数返回的最后一个表达式由函数返回。在你的情况下,这就是sources[24,3:4] <- sources[24,2:3],这就是你得到的原因,

           V2       V3
24 $9,138,418 $105,039

解决方案是通过在函数末尾添加sources来显式返回return(sources),或者只是添加sources。所以,你的代码应该是这样的:

library("docxtractr")
sourcesSummary <- lapply(files, function(x){
   doc <- read_docx(x)
   kingsTbls <- docx_extract_all_tbls(doc)
   sources <- docx_extract_tbl(doc, 6, header = FALSE)
   sources <- data.frame(sources)
   # The below two lines are the issue #
   sources[9,3:4] <- sources[9,2:3]
   sources[24,3:4] <- sources[24,2:3]
   sources #<- New code!
   })
© www.soinside.com 2019 - 2024. All rights reserved.