使用 numpy 提取数据块

问题描述 投票:0回答:1

这是我遇到的问题:

我想绘制一些用 Quantum Espresso 获得的能带。数据位于包含两列的文件中。这些列由空行分隔成块。每个块对应一个带。

这是前两个块的示例:

    0.0000  -44.2709
    0.0250  -44.2709
    0.0500  -44.2709
    0.0750  -44.2708
    0.1000  -44.2708
    0.1250  -44.2707
    0.1500  -44.2706
    0.1750  -44.2705
    0.2000  -44.2703
    0.2250  -44.2702
    0.2500  -44.2701
    0.2750  -44.2700
    0.3000  -44.2698
    0.3250  -44.2697
    0.3500  -44.2696
    0.3750  -44.2695
    0.4000  -44.2694
    0.4250  -44.2694
    0.4500  -44.2693
    0.4750  -44.2693
    0.5000  -44.2693
    0.5250  -44.2693
    0.5500  -44.2692
    0.5750  -44.2692
    0.6000  -44.2691
    0.6250  -44.2690
    0.6500  -44.2689
    0.6750  -44.2688
    0.7000  -44.2687
    0.7250  -44.2686
    0.7500  -44.2685
    0.7750  -44.2683
    0.8000  -44.2682
    0.8250  -44.2681
    0.8500  -44.2680
    0.8750  -44.2679
    0.9000  -44.2678
    0.9250  -44.2678
    0.9500  -44.2677
    0.9750  -44.2677
    1.0000  -44.2677
    1.0354  -44.2677
    1.0707  -44.2677
    1.1061  -44.2678
    1.1414  -44.2680
    1.1768  -44.2681
    1.2121  -44.2683
    1.2475  -44.2686
    1.2828  -44.2688
    1.3182  -44.2690
    1.3536  -44.2693
    1.3889  -44.2695
    1.4243  -44.2698
    1.4596  -44.2700
    1.4950  -44.2702
    1.5303  -44.2704
    1.5657  -44.2706
    1.6010  -44.2707
    1.6364  -44.2708
    1.6718  -44.2709
    1.7071  -44.2709
    1.7504  -44.2709
    1.7937  -44.2708
    1.8370  -44.2706
    1.8803  -44.2704
    1.9236  -44.2702
    1.9669  -44.2699
    2.0102  -44.2696
    2.0535  -44.2692
    2.0968  -44.2689
    2.1401  -44.2685
    2.1834  -44.2681
    2.2267  -44.2677
    2.2700  -44.2674
    2.3133  -44.2671
    2.3566  -44.2668
    2.3999  -44.2665
    2.4432  -44.2663
    2.4865  -44.2662
    2.5298  -44.2661
    2.5731  -44.2661
    2.6085  -44.2661
    2.6438  -44.2661
    2.6792  -44.2662
    2.7146  -44.2664
    2.7499  -44.2665
    2.7853  -44.2667
    2.8206  -44.2669
    2.8560  -44.2672
    2.8913  -44.2674
    2.9267  -44.2677
    2.9620  -44.2679
    2.9974  -44.2682
    3.0328  -44.2684
    3.0681  -44.2686
    3.1035  -44.2688
    3.1388  -44.2690
    3.1742  -44.2691
    3.2095  -44.2692
    3.2449  -44.2693
    3.2802  -44.2693
    3.2802  -44.2677
    3.3052  -44.2677
    3.3302  -44.2676
    3.3552  -44.2676
    3.3802  -44.2675
    3.4052  -44.2674
    3.4302  -44.2673
    3.4552  -44.2672
    3.4802  -44.2671
    3.5052  -44.2670
    3.5302  -44.2669
    3.5552  -44.2667
    3.5802  -44.2666
    3.6052  -44.2665
    3.6302  -44.2664
    3.6552  -44.2663
    3.6802  -44.2662
    3.7052  -44.2662
    3.7302  -44.2661
    3.7552  -44.2661
    3.7802  -44.2661
 
    0.0000  -20.8317
    0.0250  -20.8322
    0.0500  -20.8338
    0.0750  -20.8364
    0.1000  -20.8400
    0.1250  -20.8445
    0.1500  -20.8497
    0.1750  -20.8555
    0.2000  -20.8618
    0.2250  -20.8684
    0.2500  -20.8751
    0.2750  -20.8819
    0.3000  -20.8884
    0.3250  -20.8947
    0.3500  -20.9004
    0.3750  -20.9055
    0.4000  -20.9098
    0.4250  -20.9133
    0.4500  -20.9159
    0.4750  -20.9174
    0.5000  -20.9179
    0.5250  -20.9179
    0.5500  -20.9178
    0.5750  -20.9175
    0.6000  -20.9172
    0.6250  -20.9169
    0.6500  -20.9164
    0.6750  -20.9159
    0.7000  -20.9154
    0.7250  -20.9149
    0.7500  -20.9143
    0.7750  -20.9137
    0.8000  -20.9132
    0.8250  -20.9126
    0.8500  -20.9122
    0.8750  -20.9117
    0.9000  -20.9113
    0.9250  -20.9110
    0.9500  -20.9108
    0.9750  -20.9107
    1.0000  -20.9106
    1.0354  -20.9102
    1.0707  -20.9089
    1.1061  -20.9068
    1.1414  -20.9039
    1.1768  -20.9003
    1.2121  -20.8959
    1.2475  -20.8910
    1.2828  -20.8855
    1.3182  -20.8797
    1.3536  -20.8736
    1.3889  -20.8673
    1.4243  -20.8611
    1.4596  -20.8551
    1.4950  -20.8495
    1.5303  -20.8444
    1.5657  -20.8400
    1.6010  -20.8365
    1.6364  -20.8338
    1.6718  -20.8322
    1.7071  -20.8317
    1.7504  -20.8322
    1.7937  -20.8338
    1.8370  -20.8365
    1.8803  -20.8400
    1.9236  -20.8443
    1.9669  -20.8492
    2.0102  -20.8545
    2.0535  -20.8601
    2.0968  -20.8659
    2.1401  -20.8716
    2.1834  -20.8772
    2.2267  -20.8826
    2.2700  -20.8876
    2.3133  -20.8922
    2.3566  -20.8962
    2.3999  -20.8997
    2.4432  -20.9025
    2.4865  -20.9045
    2.5298  -20.9058
    2.5731  -20.9062
    2.6085  -20.9063
    2.6438  -20.9064
    2.6792  -20.9067
    2.7146  -20.9071
    2.7499  -20.9076
    2.7853  -20.9082
    2.8206  -20.9089
    2.8560  -20.9096
    2.8913  -20.9105
    2.9267  -20.9114
    2.9620  -20.9123
    2.9974  -20.9132
    3.0328  -20.9142
    3.0681  -20.9151
    3.1035  -20.9159
    3.1388  -20.9166
    3.1742  -20.9171
    3.2095  -20.9176
    3.2449  -20.9178
    3.2802  -20.9179
    3.2802  -20.9106
    3.3052  -20.9106
    3.3302  -20.9105
    3.3552  -20.9104
    3.3802  -20.9102
    3.4052  -20.9100
    3.4302  -20.9097
    3.4552  -20.9094
    3.4802  -20.9091
    3.5052  -20.9088
    3.5302  -20.9084
    3.5552  -20.9081
    3.5802  -20.9078
    3.6052  -20.9074
    3.6302  -20.9071
    3.6552  -20.9069
    3.6802  -20.9066
    3.7052  -20.9065
    3.7302  -20.9063
    3.7552  -20.9063
    3.7802  -20.9062

您可能会注意到,第一列一遍又一遍地包含相同的数据,只有第二列包含不同的数据。我想做的只是保留第一个块中的第一列,然后将第二列转换为单独的列。像这样:

    0.0000  -44.2709   -20.8317
    0.0250  -44.2709   -20.8322
    0.0500  -44.2709   -20.8338
    0.0750  -44.2708   -20.8364
    0.1000  -44.2708   -20.8400
    0.1250  -44.2707   -20.8445
    0.1500  -44.2706   -20.8497
    0.1750  -44.2705   -20.8555
    0.2000  -44.2703   -20.8618
    0.2250  -44.2702   -20.8684
    0.2500  -44.2701   -20.8751
    0.2750  -44.2700   -20.8819
    0.3000  -44.2698   -20.8884
    0.3250  -44.2697   -20.8947
    0.3500  -44.2696   -20.9004
    0.3750  -44.2695   -20.9055
    0.4000  -44.2694   -20.9098
    0.4250  -44.2694   -20.9133
    0.4500  -44.2693   -20.9159
    0.4750  -44.2693   -20.9174
    0.5000  -44.2693   -20.9179
    0.5250  -44.2693   -20.9179
    0.5500  -44.2692   -20.9178
    0.5750  -44.2692   -20.9175
    0.6000  -44.2691   -20.9172
    0.6250  -44.2690   -20.9169
    0.6500  -44.2689   -20.9164
    0.6750  -44.2688   -20.9159
    0.7000  -44.2687   -20.9154
    0.7250  -44.2686   -20.9149
    0.7500  -44.2685   -20.9143
    0.7750  -44.2683   -20.9137
    0.8000  -44.2682   -20.9132
    0.8250  -44.2681   -20.9126
    0.8500  -44.2680   -20.9122
    0.8750  -44.2679   -20.9117
    0.9000  -44.2678   -20.9113
    0.9250  -44.2678   -20.9110
    0.9500  -44.2677   -20.9108
    0.9750  -44.2677   -20.9107
    1.0000  -44.2677   -20.9106
    1.0354  -44.2677   -20.9102
    1.0707  -44.2677   -20.9089
    1.1061  -44.2678   -20.9068
    1.1414  -44.2680   -20.9039
    1.1768  -44.2681   -20.9003
    1.2121  -44.2683   -20.8959
    1.2475  -44.2686   -20.8910
    1.2828  -44.2688   -20.8855
    1.3182  -44.2690   -20.8797
    1.3536  -44.2693   -20.8736
    1.3889  -44.2695   -20.8673
    1.4243  -44.2698   -20.8611
    1.4596  -44.2700   -20.8551
    1.4950  -44.2702   -20.8495
    1.5303  -44.2704   -20.8444
    1.5657  -44.2706   -20.8400
    1.6010  -44.2707   -20.8365
    1.6364  -44.2708   -20.8338
    1.6718  -44.2709   -20.8322
    1.7071  -44.2709   -20.8317
    1.7504  -44.2709   -20.8322
    1.7937  -44.2708   -20.8338
    1.8370  -44.2706   -20.8365
    1.8803  -44.2704   -20.8400
    1.9236  -44.2702   -20.8443
    1.9669  -44.2699   -20.8492
    2.0102  -44.2696   -20.8545
    2.0535  -44.2692   -20.8601
    2.0968  -44.2689   -20.8659
    2.1401  -44.2685   -20.8716
    2.1834  -44.2681   -20.8772
    2.2267  -44.2677   -20.8826
    2.2700  -44.2674   -20.8876
    2.3133  -44.2671   -20.8922
    2.3566  -44.2668   -20.8962
    2.3999  -44.2665   -20.8997
    2.4432  -44.2663   -20.9025
    2.4865  -44.2662   -20.9045
    2.5298  -44.2661   -20.9058
    2.5731  -44.2661   -20.9062
    2.6085  -44.2661   -20.9063
    2.6438  -44.2661   -20.9064
    2.6792  -44.2662   -20.9067
    2.7146  -44.2664   -20.9071
    2.7499  -44.2665   -20.9076
    2.7853  -44.2667   -20.9082
    2.8206  -44.2669   -20.9089
    2.8560  -44.2672   -20.9096
    2.8913  -44.2674   -20.9105
    2.9267  -44.2677   -20.9114
    2.9620  -44.2679   -20.9123
    2.9974  -44.2682   -20.9132
    3.0328  -44.2684   -20.9142
    3.0681  -44.2686   -20.9151
    3.1035  -44.2688   -20.9159
    3.1388  -44.2690   -20.9166
    3.1742  -44.2691   -20.9171
    3.2095  -44.2692   -20.9176
    3.2449  -44.2693   -20.9178
    3.2802  -44.2693   -20.9179
    3.2802  -44.2677   -20.9106
    3.3052  -44.2677   -20.9106
    3.3302  -44.2676   -20.9105
    3.3552  -44.2676   -20.9104
    3.3802  -44.2675   -20.9102
    3.4052  -44.2674   -20.9100
    3.4302  -44.2673   -20.9097
    3.4552  -44.2672   -20.9094
    3.4802  -44.2671   -20.9091
    3.5052  -44.2670   -20.9088
    3.5302  -44.2669   -20.9084
    3.5552  -44.2667   -20.9081
    3.5802  -44.2666   -20.9078
    3.6052  -44.2665   -20.9074
    3.6302  -44.2664   -20.9071
    3.6552  -44.2663   -20.9069
    3.6802  -44.2662   -20.9066
    3.7052  -44.2662   -20.9065
    3.7302  -44.2661   -20.9063
    3.7552  -44.2661   -20.9063
    3.7802  -44.2661   -20.9062

但是有一个问题!我已经设法做一些与

numpy.unique
接近的事情,但我注意到由于某种原因,Quantum Espresso 有时会在第一列块中写入两个或多个相等的值,而第二列中的相应值不同并使用
numpy.uniques
我会丢失数据。

我已经尝试过这种方式:

kp_bands=np.take(bands[:,0],range(0,122),axis=0)
。这里
bands
是我用
numpy.loadtxt
加载数据的地方,
122
是每个块中值的数量。问题是,这并不总是一样的。根据所研究的系统可能会有所不同。

我的问题是:

如何在不丢失数据且不知道每个块中有多少行的情况下执行此操作?

python numpy
1个回答
0
投票

就像评论中提到的,pandas 是真的你最好的选择

import pandas as pd
import numpy as np

def read_arrays(filename):
    arrays = []
    with open(filename,"r") as f:
        data = f.read().split("\n\n")
    for item in data:
        arr = np.fromstring(item,dtype=float,sep=" ")
        arr = arr.reshape(len(arr)//2,2)
        arrays.append(arr)
    return arrays

def all_to_df(numpy_arrays):
    return [
        pd.DataFrame(data=item) for item in numpy_arrays
    ]

data = read_arrays("temp.txt")
dataframes = all_to_df(data)


#some check
print(dataframes[0])
print(dataframes[1])

df = dataframes[0] #assuming at least 1 dataframe

for i in range(1,len(dataframes)):
    df = pd.merge(df,dataframes[i], on=0)

df = df.rename(columns={"0": 'timestamp',"1_x": 'value_1',"1_y": 'value_2'})

print(df)
© www.soinside.com 2019 - 2024. All rights reserved.