在 python 中打开文本文件

Question

数据文件的结构如下所示：

  2.0  0    3    9.15400
      5.40189    0.77828    0.66432
      0.44219    0.00000
  2.0  0    1    9.15400
      0.00000
  2.0  0    6    9.15400
      7.38451    3.99120    2.23459    1.49781    0.77828    0.00000
  2.0  0    3    9.15400
      2.09559    0.77828    0.00000
  2.0  0    3    9.15400
      2.09559    0.77828    0.65828
      0.58990    0.00000

等等

我想创建一个如下所示的数据框：

9.15400    5.40189    0.77828    0.44219    0.00000
9.15400    0.00000
9.15400    7.38451    3.99120    2.23459    1.49781    0.77828    0.000
9.15400    2.09559    0.77828    0.00000
9.15400    2.09559    0.77828    0.65828    0.58990    0.00000

有人可以帮我开始吗？

Answer 1

为什么您想要的数据框的第 0 行不包含

0.66432

？

不清楚表格的结构。如果它像您的问题中显示的那样非结构化，请尝试以下操作：

输入：

  2.0  0    3    9.15400
      5.40189    0.77828    0.66432
      0.44219    0.00000
  2.0  0    1    9.15400
      0.00000
  2.0  0    6    9.15400
      7.38451    3.99120    2.23459    1.49781    0.77828    0.00000
  2.0  0    3    9.15400
      2.09559    0.77828    0.00000
  2.0  0    3    9.15400
      2.09559    0.77828    0.65828
      0.58990    0.00000

import pandas as pd

# Make the floats the same width of your desired output
pd.options.display.float_format = '{:.5f}'.format

filename = r"C:\Users\Bobson Dugnutt\Desktop\table.txt"

df = pd.read_fwf(filename, header=None).fillna("")

new_rows = []
values = []

for row in df.itertuples():
    # row[1] is the column which is either 2.0 or blank ("")
    if row[1] != "":
        if values:
            new_rows.append(values)
            values = []
        
        # row[4] is the column with a value like 9.15400
        values.append(row[4])
    else:
        # Add all non-blank values starting from the same column as above
        values.extend(value for value in row[4:] if value != "")

new_rows.append(values)

# fillna("") so make the NaN values blank
new_df = pd.DataFrame(new_rows).fillna("")

print(new_df)

输出：

        0       1       2       3       4       5       6
0 9.15400 5.40189 0.77828 0.66432 0.44219 0.00000        
1 9.15400 0.00000                                        
2 9.15400 7.38451 3.99120 2.23459 1.49781 0.77828 0.00000
3 9.15400 2.09559 0.77828 0.00000                        
4 9.15400 2.09559 0.77828 0.65828 0.58990 0.00000

或者如果每一列的宽度都是固定的，并且在你的问题中没有正确显示，试试这个：

输入：

  2.0  0    3    9.15400
                 5.40189    0.77828    0.66432
                 0.44219    0.00000
  2.0  0    1    9.15400
                 0.00000
  2.0  0    6    9.15400
                 7.38451    3.99120    2.23459    1.49781    0.77828    0.00000
  2.0  0    3    9.15400
                 2.09559    0.77828    0.00000
  2.0  0    3    9.15400
                 2.09559    0.77828    0.65828
                 0.58990    0.00000

filename = r"C:\Users\Bobson Dugnutt\Desktop\table2.txt"

# This returns a dataframe with a single column
df = pd.read_table(filename, header=None)

# Split the bad-boy at every double space
df = df[0].str.split("  ", expand=True)

new_rows = []
values = []

for row in df.itertuples():
    # row[2] is the column which is either 2.0 or blank ("")
    if row[2]:
        if values:
            new_rows.append(values)
            values = []
        
        # row[7] is the column with a value like 9.15400
        values.append(row[7])
    else:
        # Add all non-blank values starting from the the 4th column.
        # The 4th column is the first column meaningful values are 
        # found for these rows
        values.extend(value for value in row[4:] if value)
    
new_rows.append(values)

# fillna("") so make the NaN values blank
new_df = pd.DataFrame(new_rows).fillna("")
print(new_df)

输出：

         0        1        2        3        4        5        6
0  9.15400  5.40189  0.77828  0.66432  0.44219  0.00000         
1  9.15400  0.00000                                             
2  9.15400  7.38451  3.99120  2.23459  1.49781  0.77828  0.00000
3  9.15400  2.09559  0.77828  0.00000                           
4  9.15400  2.09559  0.77828  0.65828  0.58990  0.00000

在这两种情况下，您所要做的就是遍历每一行，检查第一列是否不是空白（即是

2.0

），如果是，则在下一行中获取所有其他有意义的值，直到你遇到另一个类似的行。具体索引因表最初的解析方式而异

在 python 中打开文本文件

问题描述投票：0回答：1

1个回答

最新问题

在 python 中打开文本文件

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1