发现不寻常的行结束导致错误

Question

我正在尝试下载一个大型的纽约出租车数据数据库，可在NYC TLC website公开发布。

library(data.table)
feb14 <- fread('https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2014-02.csv', header = T)

执行上面的代码成功下载数据（需要几分钟），但由于内部错误而无法解析。我也尝试过删除header = T。

是否有解决方法来处理fread中的“不寻常的线路结束”？

Error in fread("https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2014-02.csv",  : 
  Internal error. No eol2 immediately before line 3 after sep detection.
In addition: Warning message:
In fread("https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2014-02.csv",  :
  Detected eol as \n\r, a highly unusual line ending. According to Wikipedia the Acorn BBC used this. If it is intended that the first column on the next row is a character column where the first character of the field value is \r (why?) then the first column should start with a quote (i.e. 'protected'). Proceeding with attempt to read the file.

Answer 1

有时像read.csv / read.table这样的其他选项可能会有不同的行为......所以你总是可以试试。（也许源代码告诉了为什么，没有调查过）。

另一种选择是使用readLines（）来读取这样的文件。据我所知，这里没有解析/格式化。因此，据我所知，这是读取文件的最基本方法

最后，快速修复：在fread中使用选项'skip = ...'，或者通过说'nrows = ...'来控制结尾。

Answer 2

似乎问题可能是由于原始.csv文件中标题和数据之间存在空行而引起的。使用notepad ++从.csv中删除该行似乎为我解决了这个问题。

Answer 3

fread有点可疑。 data.table是用于读取大文件的更快，更高性能，但在这种情况下，行为不是最佳的。您可能想在github上提出此问题

我甚至可以使用nrows = 5甚至nrows = 1重现下载文件的问题，但只有坚持原始文件。如果我复制粘贴前几行然后尝试，问题就消失了。如果我直接从网上用小nrows阅读，这个问题也就消失了。这甚至不是encoding问题，因此我建议提出一个问题。

我尝试使用read.csv和100,000行读取文件没有问题，并在6秒内。

feb14_2 <- read.csv("https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2014-02.csv", header = T, nrows = 100000)

header = T是一个多余的论据所以不会对fread产生影响，但read.csv需要它。

发现不寻常的行结束导致错误

问题描述投票：0回答：3

3个回答

最新问题

发现不寻常的行结束导致错误

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3