我有一些工作代码,可以将 BOM 标记添加到新文件中。
#writing
File.open name, 'w', 0644 do |file|
file.write "\uFEFF"
file.write @data
end
#reading
File.open name, 'r:bom|utf-8' do |file|
file.read
end
有没有办法自动添加标记,而不需要在数据前写神秘的
"\uFEFF"
?也许像File.open name, 'w:bom' # this mode has no effect
之类的东西?
**** 这个答案带来了一个新的宝石:file_with_bom ****
我过去也遇到过类似的问题,我用
File.open
模式的附加编码变体扩展了 w
:
class File
BOM_LIST_hex = {
Encoding::UTF_8 => "\xEF\xBB\xBF", #"\uEFBBBF"
Encoding::UTF_16BE => "\xFE\xFF", #"\uFEFF",
Encoding::UTF_16LE => "\xFF\xFE",
Encoding::UTF_32BE => "\x00\x00\xFE\xFF",
Encoding::UTF_32LE => "\xFE\xFF\x00\x00",
}
BOM_LIST_hex.freeze
def utf_bom_hex(encoding = external_encoding)
BOM_LIST_hex[encoding]
end
class << self
alias :open_old :open
def open(filename, mode_string = 'r', options = {}, &block)
#check for bom-flag in mode_string
options[:bom] = true if mode_string.sub!(/-bom/i,'')
f = open_old(filename, mode_string, options)
if options[:bom]
case mode_string
#r|bom already standard since 1.9.2
when /\Ar/ #read mode -> remove BOM
#remove BOM
bom = f.read(f.utf_bom_hex.bytesize)
#check, if it was really a bom
if bom != f.utf_bom_hex.force_encoding(bom.encoding)
f.rewind #return to position 0 if BOM was no BOM
end
when /\Aw/ #write mode -> attach BOM
f = open_old(filename, mode_string, options)
f << f.utf_bom_hex.force_encoding(f.external_encoding)
end #mode_string
end
if block_given?
yield f
f.close
end
end
end
end #File
测试代码:
EXAMPLE_TEXT = 'some content öäü'
File.open("file_utf16le.txt", "w:utf-16le|bom"){|f| f << EXAMPLE_TEXT }
File.open("file_utf16le.txt", "r:utf-16le|bom:utf-8"){|f| p f.read }
File.open("file_utf16le.txt", "r:utf-16le:utf-8", :bom => true ){|f| p f.read }
File.open("file_utf16le.txt", "r:utf-16le:utf-8"){|f| p f.read }
File.open("file_utf8.txt", "w:utf-8", :bom => true ){|f| f << EXAMPLE_TEXT }
File.open("file_utf8.txt", "r:utf-8", :bom => true ){|f| p f.read }
File.open("file_utf8.txt", "r:utf-8|bom", ){|f| p f.read }
File.open("file_utf8.txt", "r:utf-8", ){|f| p f.read }
一些备注:
-bom
作为 bom 指示器(ruby 1.9 使用 |bom
。一些需要修复才能变得更好:
|bom
代替 -bom
r|bom
进行阅读也许明天我会找到一些时间来重构我的代码并将其作为宝石提供。
唉,我认为你的手动方法是可行的方法,至少我不知道更好的方法:
http://blog.grayproducts.net/articles/miscellaneous_m17n_details
引用JEG2的文章:
Ruby 1.9 不会自动向您的数据添加 BOM,因此您需要 如果你想要的话就需要照顾它。幸运的是,还不算太 艰难的。基本思想只是打印所需的字节 文件的开头。
@knut 的修剪版本
File.open("file_utf8.txt", "w:utf-8") do |f|
f << "\xEF\xBB\xBF".force_encoding("UTF-8")
f << EXAMPLE_TEXT
end
试试这个
# read content form old file
original_content = File.read(file_path)
# define UTF-8 BOM
bom = "\xEF\xBB\xBF"
# new file,add BOM in the head of content
File.open(new_file_path, "w:UTF-8") do |file|
file.write(bom + original_content)
end