数据文件的结构如下所示:
2.0 0 3 9.15400
5.40189 0.77828 0.66432
0.44219 0.00000
2.0 0 1 9.15400
0.00000
2.0 0 6 9.15400
7.38451 3.99120 2.23459 1.49781 0.77828 0.00000
2.0 0 3 9.15400
2.09559 0.77828 0.00000
2.0 0 3 9.15400
2.09559 0.77828 0.65828
0.58990 0.00000
等等
我想创建一个如下所示的数据框:
9.15400 5.40189 0.77828 0.44219 0.00000
9.15400 0.00000
9.15400 7.38451 3.99120 2.23459 1.49781 0.77828 0.000
9.15400 2.09559 0.77828 0.00000
9.15400 2.09559 0.77828 0.65828 0.58990 0.00000
有人可以帮我开始吗?
为什么您想要的数据框的第 0 行不包含
0.66432
?
不清楚表格的结构。如果它像您的问题中显示的那样非结构化,请尝试以下操作:
输入:
2.0 0 3 9.15400
5.40189 0.77828 0.66432
0.44219 0.00000
2.0 0 1 9.15400
0.00000
2.0 0 6 9.15400
7.38451 3.99120 2.23459 1.49781 0.77828 0.00000
2.0 0 3 9.15400
2.09559 0.77828 0.00000
2.0 0 3 9.15400
2.09559 0.77828 0.65828
0.58990 0.00000
import pandas as pd
# Make the floats the same width of your desired output
pd.options.display.float_format = '{:.5f}'.format
filename = r"C:\Users\Bobson Dugnutt\Desktop\table.txt"
df = pd.read_fwf(filename, header=None).fillna("")
new_rows = []
values = []
for row in df.itertuples():
# row[1] is the column which is either 2.0 or blank ("")
if row[1] != "":
if values:
new_rows.append(values)
values = []
# row[4] is the column with a value like 9.15400
values.append(row[4])
else:
# Add all non-blank values starting from the same column as above
values.extend(value for value in row[4:] if value != "")
new_rows.append(values)
# fillna("") so make the NaN values blank
new_df = pd.DataFrame(new_rows).fillna("")
print(new_df)
输出:
0 1 2 3 4 5 6
0 9.15400 5.40189 0.77828 0.66432 0.44219 0.00000
1 9.15400 0.00000
2 9.15400 7.38451 3.99120 2.23459 1.49781 0.77828 0.00000
3 9.15400 2.09559 0.77828 0.00000
4 9.15400 2.09559 0.77828 0.65828 0.58990 0.00000
或者如果每一列的宽度都是固定的,并且在你的问题中没有正确显示,试试这个:
输入:
2.0 0 3 9.15400
5.40189 0.77828 0.66432
0.44219 0.00000
2.0 0 1 9.15400
0.00000
2.0 0 6 9.15400
7.38451 3.99120 2.23459 1.49781 0.77828 0.00000
2.0 0 3 9.15400
2.09559 0.77828 0.00000
2.0 0 3 9.15400
2.09559 0.77828 0.65828
0.58990 0.00000
filename = r"C:\Users\Bobson Dugnutt\Desktop\table2.txt"
# This returns a dataframe with a single column
df = pd.read_table(filename, header=None)
# Split the bad-boy at every double space
df = df[0].str.split(" ", expand=True)
new_rows = []
values = []
for row in df.itertuples():
# row[2] is the column which is either 2.0 or blank ("")
if row[2]:
if values:
new_rows.append(values)
values = []
# row[7] is the column with a value like 9.15400
values.append(row[7])
else:
# Add all non-blank values starting from the the 4th column.
# The 4th column is the first column meaningful values are
# found for these rows
values.extend(value for value in row[4:] if value)
new_rows.append(values)
# fillna("") so make the NaN values blank
new_df = pd.DataFrame(new_rows).fillna("")
print(new_df)
输出:
0 1 2 3 4 5 6
0 9.15400 5.40189 0.77828 0.66432 0.44219 0.00000
1 9.15400 0.00000
2 9.15400 7.38451 3.99120 2.23459 1.49781 0.77828 0.00000
3 9.15400 2.09559 0.77828 0.00000
4 9.15400 2.09559 0.77828 0.65828 0.58990 0.00000
在这两种情况下,您所要做的就是遍历每一行,检查第一列是否不是空白(即是
2.0
),如果是,则在下一行中获取所有其他有意义的值,直到你遇到另一个类似的行。具体索引因表最初的解析方式而异