从Excel电子表格导入特定单元格

问题描述 投票:1回答:1

我有以下电子表格,上传地址:https://files.fm/u/6uhc3qwr

我试图导入资产负债表的特定单元格,所以在Assets部分(大约第19行)我们有Total Current Assets然后我想导入该行的所有Total current Assets值。

31/12/2016              31/12/2015              31/12/2014          31/12/2013          31/12/2012                      31/12/2011          31/12/2010          31/12/2009          31/12/2008          31/12/2007      
th USD              th USD              th USD          th USD          th USD                      th USD          th USD          th USD          th USD          th USD      

    21.855.481              23.407.658          30.740.856              35.002.444          34.819.795                      36.161.838          24.317.544          20.191.164          51.185.242          22.041.144  

对我来说困难的部分是导入这些数据以及上面几行的日期行。我不是要导入数据文件的第19行,而是导入与行名称Total Current Assets对应的值。我有很多这些资产负债表,excel行号略有变化。

r web-scraping
1个回答
2
投票
 library(xlsx)
 library(tidyverse)

 # Read the data from excel file
 df <- read.xlsx('~/Downloads/balance_upload_stack.xlsx',
                 sheetIndex = 1, stringsAsFactors = F)

 # Identify the rows of interest based on the name and subset the original data.frame
 rows_of_interest <- which(df[,1] %in% c("Annual report/Consolidated"," Total Current Assets"))
 new_df <- df[rows_of_interest,]
 new_df <- new_df[!(duplicated(new_df[,1])),]

 # Remove the column with NA and align the data which are spread across different columns
 new_df <- new_df[colSums(!is.na(new_df)) > 0]
 new_df <- cbind(Index = rev(new_df[,1]), 
                 new_col = na.omit(unlist(new_df[,-1]))) %>% as.data.frame()

输出数据帧

                      Index          new_col
       Total Current Assets 21855481.4195938
 Annual report/Consolidated       31/12/2016
       Total Current Assets    23407658.3146
 Annual report/Consolidated       31/12/2015
       Total Current Assets 30740855.9858115
 Annual report/Consolidated       31/12/2014
       Total Current Assets 35002443.8754019
 Annual report/Consolidated       31/12/2013
       Total Current Assets 34819794.6592976
 Annual report/Consolidated       31/12/2012
       Total Current Assets 36161837.8907298
 Annual report/Consolidated       31/12/2011
       Total Current Assets 24317543.6967938
 Annual report/Consolidated       31/12/2010
       Total Current Assets 20191164.1378803
 Annual report/Consolidated       31/12/2009
       Total Current Assets 51185242.2723579
 Annual report/Consolidated       31/12/2008
       Total Current Assets 22041143.7373581
 Annual report/Consolidated       31/12/2007
© www.soinside.com 2019 - 2024. All rights reserved.