在 python 中打开文本文件

问题描述 投票:0回答:1

数据文件的结构如下所示:

  2.0  0    3    9.15400
      5.40189    0.77828    0.66432
      0.44219    0.00000
  2.0  0    1    9.15400
      0.00000
  2.0  0    6    9.15400
      7.38451    3.99120    2.23459    1.49781    0.77828    0.00000
  2.0  0    3    9.15400
      2.09559    0.77828    0.00000
  2.0  0    3    9.15400
      2.09559    0.77828    0.65828
      0.58990    0.00000

等等

我想创建一个如下所示的数据框:

9.15400    5.40189    0.77828    0.44219    0.00000
9.15400    0.00000
9.15400    7.38451    3.99120    2.23459    1.49781    0.77828    0.000
9.15400    2.09559    0.77828    0.00000
9.15400    2.09559    0.77828    0.65828    0.58990    0.00000

有人可以帮我开始吗?

python python-3.x pandas dataframe text-files
1个回答
0
投票

为什么您想要的数据框的第 0 行不包含

0.66432

不清楚表格的结构。如果它像您的问题中显示的那样非结构化,请尝试以下操作:

输入:

  2.0  0    3    9.15400
      5.40189    0.77828    0.66432
      0.44219    0.00000
  2.0  0    1    9.15400
      0.00000
  2.0  0    6    9.15400
      7.38451    3.99120    2.23459    1.49781    0.77828    0.00000
  2.0  0    3    9.15400
      2.09559    0.77828    0.00000
  2.0  0    3    9.15400
      2.09559    0.77828    0.65828
      0.58990    0.00000
import pandas as pd

# Make the floats the same width of your desired output
pd.options.display.float_format = '{:.5f}'.format

filename = r"C:\Users\Bobson Dugnutt\Desktop\table.txt"

df = pd.read_fwf(filename, header=None).fillna("")

new_rows = []
values = []

for row in df.itertuples():
    # row[1] is the column which is either 2.0 or blank ("")
    if row[1] != "":
        if values:
            new_rows.append(values)
            values = []
        
        # row[4] is the column with a value like 9.15400
        values.append(row[4])
    else:
        # Add all non-blank values starting from the same column as above
        values.extend(value for value in row[4:] if value != "")

new_rows.append(values)

# fillna("") so make the NaN values blank
new_df = pd.DataFrame(new_rows).fillna("")

print(new_df)

输出:

        0       1       2       3       4       5       6
0 9.15400 5.40189 0.77828 0.66432 0.44219 0.00000        
1 9.15400 0.00000                                        
2 9.15400 7.38451 3.99120 2.23459 1.49781 0.77828 0.00000
3 9.15400 2.09559 0.77828 0.00000                        
4 9.15400 2.09559 0.77828 0.65828 0.58990 0.00000        

或者如果每一列的宽度都是固定的,并且在你的问题中没有正确显示,试试这个:

输入:

  2.0  0    3    9.15400
                 5.40189    0.77828    0.66432
                 0.44219    0.00000
  2.0  0    1    9.15400
                 0.00000
  2.0  0    6    9.15400
                 7.38451    3.99120    2.23459    1.49781    0.77828    0.00000
  2.0  0    3    9.15400
                 2.09559    0.77828    0.00000
  2.0  0    3    9.15400
                 2.09559    0.77828    0.65828
                 0.58990    0.00000
filename = r"C:\Users\Bobson Dugnutt\Desktop\table2.txt"

# This returns a dataframe with a single column
df = pd.read_table(filename, header=None)

# Split the bad-boy at every double space
df = df[0].str.split("  ", expand=True)

new_rows = []
values = []

for row in df.itertuples():
    # row[2] is the column which is either 2.0 or blank ("")
    if row[2]:
        if values:
            new_rows.append(values)
            values = []
        
        # row[7] is the column with a value like 9.15400
        values.append(row[7])
    else:
        # Add all non-blank values starting from the the 4th column.
        # The 4th column is the first column meaningful values are 
        # found for these rows
        values.extend(value for value in row[4:] if value)
    
new_rows.append(values)

# fillna("") so make the NaN values blank
new_df = pd.DataFrame(new_rows).fillna("")
print(new_df)

输出:

         0        1        2        3        4        5        6
0  9.15400  5.40189  0.77828  0.66432  0.44219  0.00000         
1  9.15400  0.00000                                             
2  9.15400  7.38451  3.99120  2.23459  1.49781  0.77828  0.00000
3  9.15400  2.09559  0.77828  0.00000                           
4  9.15400  2.09559  0.77828  0.65828  0.58990  0.00000         

在这两种情况下,您所要做的就是遍历每一行,检查第一列是否不是空白(即是

2.0
),如果是,则在下一行中获取所有其他有意义的值,直到你遇到另一个类似的行。具体索引因表最初的解析方式而异

© www.soinside.com 2019 - 2024. All rights reserved.