数据集(提取2个单独的项目,n每个项目6列):
1
No. of A
600
No. of B
2
No. of C
6
No. of A
500
No. of B
4
No. of C
...
使用Python,最好的方法是将上面的内容转换并输出为如下所示的.csv文件?
1,No. of A,600,No. of B,2,No. of C
6,No. of A,500,No. of B,4,No. of C
...
感谢所有建议!
修改答案
body = """
1
No. of A
600
No. of B
2
No. of C
6
No. of A
500
No. of B
4
No. of C
7
No. of A
501
No. of B
5
No. of C
"""
temp_body = body.strip().split("\n")
parsed_body = [temp_body[(0 + i - 6):i] for i in range(6, len(temp_body) + 1, 6)]
import pandas as pd
df = pd.DataFrame(parsed_body)
df.to_csv('output.csv', sep=',', header=None, index=None)
结果
假设\n\n
分隔了2个不同的行,然后您可以这样尝试:
In [1]: body = """
...: 1
...: No. of A
...: 600
...: No. of B
...: 2
...: No. of C
...:
...: 6
...: No. of A
...: 500
...: No. of B
...: 4
...: No. of C
...: """
In [2]: parsed_body = [i.strip().split("\n") for i in body.split("\n\n")]
In [3]: parsed_body
Out[4]:
[['1', 'No. of A', '600', 'No. of B', '2', 'No. of C'],
['6', 'No. of A', '500', 'No. of B', '4', 'No. of C']]
一旦获得列表,就可以在Python中使用csv模块使用csvwriter,并将其写为csv
如果\ n \ n不是我们的情况,并且是连续的,那么您可以尝试类似的方法(有点hack,但也许您可以提出更好的方法):
In [43]: body = """
...: 1
...: No. of A
...: 600
...: No. of B
...: 2
...: No. of C
...: 6
...: No. of A
...: 500
...: No. of B
...: 4
...: No. of C
...: 22
...: No. of Q
...: 500
...: No. of R
...: 4
...: No. of S
...: """
In [44]: temp_body = body.strip().split("\n")
In [45]: parsed_body = [temp_body[(0 + i - 6):i] for i in range(6, len(temp_body) + 1, 6)]
In [46]: parsed_body
Out[46]:
[['1', 'No. of A', '600', 'No. of B', '2', 'No. of C'],
['6', 'No. of A', '500', 'No. of B', '4', 'No. of C'],
['22', 'No. of Q', '500', 'No. of R', '4', 'No. of S']]
假设您的数据一致且干净,则可以在双换行符处拆分数据。然后为每个项目用逗号替换换行符:
data = '''1
No. of A
600
No. of B
2
No. of C
6
No. of A
500
No. of B
4
No. of C'''
items = [item.replace('\n', ',') for item in data.split('\n\n')]
print('\n'.join(items))
# 1,No. of A,600,No. of B,2,No. of C
# 6,No. of A,500,No. of B,4,No. of C