我有一个csv文件,其中包含许多奇怪的符号。示例如下:
df = data.frame(comments = c('Korea¬Ãs Ministry of Food and Drug Safety is proposing an amendment seeking to amend the Standards and Specification','it is important to highlight:\n• Many maximum limits for drug',
'The European Parliament has published a decision, which aims to establish a special Committee to examine the EU¬Ãs authorization procedure'))
write.csv(df, './example.csv', row.names = FALSE)
有谁知道我该如何清除R(或python)中那些怪异的符号。我不知道为什么会发生这种情况以及如何清理它们。非常感谢。
假设“怪异”是不是“正常”字母,数字,点或逗号的所有内容:
gsub("[^A-z0-9\\. ,]", "", df$comment)
[1] "Koreas Ministry of Food and Drug Safety is proposing an amendment seeking to amend the Standards and Specification"
[2] "it is important to highlight Many maximum limits for drug"
[3] "The European Parliament has published a decision, which aims to establish a special Committee to examine the EUs authorization procedure"