(R)用UTF-8和Windows 10保存数据(矢量或数据框)。

问题描述 投票:0回答:1

我试图保存从一个网站上下载的一些数据,其中包括一些汉字。我试了很多方法都没有成功。R studio默认文本编码设置为UTF-8,windows 10地区也设置为Beta,使用unicode UTF-8来支持全球语言。


##package used
library(jiebaR) ##here for file_coding
library(htm2txt) ## to get the text
library(httr) ## just in case
library(readtext)

##get original text with chinese character
mytxtC <- gettxt("https://archive.li/wip/kRknx")

##print to check that chinese characters appear
mytxtC

##try to save in UTF-8
write.csv(mytxtC, "csv_mytxtC.csv", row.names = FALSE, fileEncoding = "UTF-8")

##check if it is readable
read.csv("csv_mytxtC.csv", encoding = "UTF-8")

##doesn't work, check file encoding
file_coding("csv_mytxtC.csv")
## answer: "windows-1252"

##try with txt
write(mytxtC, "txt_mytxtC.txt")
toto <- readtext("txt_mytxtC.txt")
toto[1,2]

##still not, try file_coding
file_coding("txt_mytxtC.txt")
## "windows-1252" ```

For information
``` Sys.getlocale()
[1] "LC_COLLATE=French_Switzerland.1252;LC_CTYPE=French_Switzerland.1252;LC_MONETARY=French_Switzerland.1252;LC_NUMERIC=C;LC_TIME=French_Switzerland.1252" ```

r utf-8 windows-10 cjk saving-data
1个回答
0
投票

我改了setLocal,好像可以用了.我只是在代码的开头加了这一行。Sys.setlocale("LC_CTYPE","chinese")

我只是在代码的开头加了这行: 只需要记得最终把它改回来。而且,我还是觉得很奇怪,这一行使得使用UTF-8保存成为可能,而之前是不可能的......


0
投票

这对我来说在Windows上是可行的。

下载文件。

download.file("https://archive.li/wip/kRknx", destfile="external_file", method="libcurl")

输入文本 。

my_text <- readLines("external_file")  # readLines(url) works as well

检查UTF8 :

> sum(validUTF8(my_text)) == length(my_text)
[1] TRUE

你也可以检查文件。

> validUTF8("external_file")
[1] TRUE

这里是唯一的 差异 我注意到,在Windows.NET系统中,我发现了一些问题。

user@somewhere:~/Downloads$ file external_file 
external_file: HTML document, UTF-8 Unicode text, with very long lines, with CRLF line terminators

user@somewhere:~/Downloads$ file external_file 
external_file: HTML document, UTF-8 Unicode text, with very long lines
© www.soinside.com 2019 - 2024. All rights reserved.