如何使用read.csv2.sql读取zip文件而不解压缩?

问题描述 投票:0回答:1

我正在尝试读取一个zip文件,而不是在我的目录中解压缩,同时利用read.csv2.sql进行特定的行过滤。

zip文件可以下载here

我已经尝试设置与read.csv2.sql的文件连接,但似乎它不接受文件连接作为“文件”的参数。

我已经在我的机器上安装了sqldf包。

这是我所描述的问题的以下R代码:

### Name the download file
zipFile <- "Dataset.zip"

### Download it
download.file("https://d396qusza40orc.cloudfront.net/exdata%2Fdata%2Fhousehold_power_consumption.zip",zipFile,mode="wb")

## Set up zip file directory
zip_dir <- paste0(workingDirectory,"/Dataset.zip")

### Establish link to "household_power_consumption.txt" inside zip file
data_file <- unz(zip_dir,"household_power_consumption.txt")

### Read file into loaded_df
loaded_df <- read.csv2.sql(data_file , sql="SELECT * FROM file WHERE Date='01/02/2007' OR Date='02/02/2007'",header=TRUE)

### Error Msg
### -Error in file(file) : invalid 'description' argument
r csv zip
1个回答
1
投票

这不使用read.csv2.sql但是因为文件中只有大约200万条记录,所以应该可以下载它,使用read.csv2读取它然后在R中将其子集化。

# download file creating zipfile
u <-"https://d396qusza40orc.cloudfront.net/exdata%2Fdata%2Fhousehold_power_consumption.zip"
zipfile <- sub(".*%2F", "", u)
download.file(u, zipfile)

# extract fname from zipfile, read it into DF0 and subset it to DF 
fname <- sub(".zip", ".txt", zipfile)
DF0 <- read.csv2(unz(zipfile, fname))
DF0$Date <- as.Date(DF0$Date, format = "%d/%m/%Y")
DF <- subset(DF0, Date == '2007-02-01' | Date == '2007-02-02')

# can optionally free up memory used by DF0
# rm(DF0)
© www.soinside.com 2019 - 2024. All rights reserved.