Python:如何使用控制字符分隔符导入类似于dat文件的csv

问题描述 投票:0回答:2

我有一个数据文件,其中DC4控制字符作为分隔符。这是我现在的代码(我从别人那里复制,这不是我的代码)。

import csv
with open('Test.dat') as csv_file:
    csv_reader = csv.reader(csv_file, quotechar='þ', delimiter='')
    line_count = 0
    for row in csv_reader:
        if line_count == 0:
            print(f'Column names are {", ".join(row)}')
            line_count += 1
        else:
            print(f'\t{row[0]} works in the {row[1]} department, and was born in {row[2]}.')
            line_count += 1
    print(f'Processed {line_count} lines.')

正如您所看到的那样,字符由一个框显示,到目前为止只有notepad ++可以读取它。我找到了curses.ascii.isctrl(c),它似乎能够通过python读取该字符,然后将其作为插入符号读取? (https://docs.python.org/3.2/library/curses.ascii.html

我是编码新手,不确定如何实现这一点,或者它是否适用于我。下面是我试图在文本和屏幕截图中读取的dat文件示例。

þIdentifierþþColumn 2þþColumn 3þ
þXX_0012345þþRandom Data 1þþRandom Data 1þ
þXX_0012346þþRandom Data 6þþRandom Data 2þ
þXX_0012347þþRandom Data 1þþRandom Data 3þ
þXX_0012348þþRandom Data 8þþRandom Data 4þ
þXX_0012349þþRandom Data 1þþRandom Data 5þ
þXX_0012345þþRandom Data 9þþRandom Data 1þ

Text File to see the DC4 control character

这是在python 3.6.1上使用此代码时的输出。一切看起来都不错,除了Ãcharacter字符是DC4字符的读取方式。

Column names are þIdentifierþ, þColumn 2þ, þColumn 3þ
    þXX_0012345þ works in the þRandom Data 1þ department, and was born in þRandom Data 1þ.
    þXX_0012346þ works in the þRandom Data 6þ department, and was born in þRandom Data 2þ.
    þXX_0012347þ works in the þRandom Data 1þ department, and was born in þRandom Data 3þ.
    þXX_0012348þ works in the þRandom Data 8þ department, and was born in þRandom Data 4þ.
    þXX_0012349þ works in the þRandom Data 1þ department, and was born in þRandom Data 5þ.
    þXX_0012345þ works in the þRandom Data 9þ department, and was born in þRandom Data 1þ.
Processed 7 lines.

任何有关这方面的帮助将不胜感激。谢谢!

python python-import delimited-text control-characters
2个回答
0
投票

您可以使用转义字符。 DC4是Ascii 20(0x14)

csv_reader = csv.reader(csv_file, quotechar='þ', delimiter='\x14')

0
投票

事实证明这是我的计算机而不是python的问题。显然我无法查看它只显示为白色框的角色。有没有办法编辑窗口10来显示该字符

© www.soinside.com 2019 - 2024. All rights reserved.