RUBY 中的 EBCDIC 到 ASCII

问题描述 投票:0回答:2

我有一个从大型机生成的 EBCDIC 文件,需要将其转换为 ASCII 进行数据处理。
任何帮助,将不胜感激。

ruby ebcdic
2个回答
0
投票

自 [Ruby 2.3 起,EBCDIC 编码可用][1]:

编码

新编码::IBM037(别名 ebcdic-cp-us;虚拟)

所以这应该有效:

src = 'out_26877296.tst'
content = File.read(src, encoding: 'IBM037:ASCII')

0
投票

为了使其保持最新,对于 Ruby 3.1.2p20,所有可用的编码都是(为了易读而换行):

irb(main):015> Encoding.name_list.sort.join ", "
=> "646, ANSI_X3.4-1968, ASCII, ASCII-8BIT, BINARY, Big5, Big5-HKSCS,
Big5-HKSCS:2008, Big5-UAO, CESU-8, CP1250, CP1251, CP1252, CP1253,
CP1254, CP1255, CP1256, CP1257, CP1258, CP437, CP50220, CP50221, 
CP51932, CP65000, CP65001, CP720, CP737, CP775, CP850, CP852, 
CP855, CP857, CP860, CP861, CP862, CP863, CP864, CP865, CP866, CP869, CP874, CP878, CP932, 
CP936, CP949, CP950, CP951, EUC-CN, EUC-JIS-2004, EUC-JISX0213, EUC-JP, EUC-KR, EUC-TW, 
Emacs-Mule, GB12345, GB18030, GB1988, GB2312, GBK, IBM037, IBM437, IBM720, IBM737, IBM775, 
IBM850, IBM852, IBM855, IBM857, IBM860, IBM861, IBM862, IBM863, IBM864, IBM865, IBM866, 
IBM869, ISO-2022-JP, ISO-2022-JP-2, ISO-2022-JP-KDDI, ISO-8859-1, ISO-8859-10, ISO-8859-11, ISO-8859-13, ISO-8859-14, ISO-8859-15, ISO-8859-16, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5, ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, ISO2022-JP, ISO2022-JP2, ISO8859-1, ISO8859-10, ISO8859-11, 
ISO8859-13, ISO8859-14, ISO8859-15, ISO8859-16, ISO8859-2, ISO8859-3, ISO8859-4, ISO8859-5, 
ISO8859-6, ISO8859-7, ISO8859-8, ISO8859-9, KOI8-R, KOI8-U, MacJapan, MacJapanese, PCK, SJIS, 
SJIS-DoCoMo, SJIS-KDDI, SJIS-SoftBank, Shift_JIS, TIS-620, UCS-2BE, UCS-4BE, UCS-4LE, 
US-ASCII, UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32BE, UTF-32LE, UTF-7, UTF-8, UTF-8-HFS, 
UTF-8-MAC, UTF8-DoCoMo, UTF8-KDDI, UTF8-MAC, UTF8-SoftBank, Windows-1250, Windows-1251, 
Windows-1252, Windows-1253, Windows-1254, Windows-1255, Windows-1256, Windows-1257, 
Windows-1258, Windows-31J, Windows-874, csWindows31J, ebcdic-cp-us, euc-jp-ms, eucCN, eucJP, 
eucJP-ms, eucKR, eucTW, external, filesystem, internal, locale, macCentEuro, macCroatian, 
macCyrillic, macGreek, macIceland, macRoman, macRomania, macThai, macTurkish, macUkraine, 
stateless-ISO-2022-JP, stateless-ISO-2022-JP-KDDI"

EBCDIC 有多种风格:IBM737、IBM775、 IBM850、IBM852、IBM855、IBM857、IBM860、IBM861、IBM862、IBM863、IBM864、IBM865、IBM866 和 IBM869.

我不知道有什么方法可以确定正在使用哪个,除了注意转换时何时出现问题。

从 IBM037 转换为 UTF-8:

File.read('some_ibm_file', encoding: 'IBM037:UTF-8')
© www.soinside.com 2019 - 2024. All rights reserved.