在python中读取在线.tbl数据文件

Question

就像标题所说，我正在尝试读取 .tbl 格式的在线数据文件。以下是数据链接：https://irsa.ipac.caltech.edu/data/COSMOS/tables/morphology/cosmos_morph_cassata_1.1.tbl

我尝试了以下代码

cosmos= pd.read_table('https://irsa.ipac.caltech.edu/data/COSMOS/tables/morphology/cosmos_morph_cassata_1.1.tbl')

运行这个并没有给我任何错误，但是当我写

print (cosmos.column)

时，它没有给我一个单独列的列表，而是 python 将所有内容放在一起并给了我看起来像这样的输出：

Index(['|            ID|            RA|           DEC|  MAG_AUTO_ACS|       R_PETRO|        R_HALF|    CONC_PETRO|     ASYMMETRY|          GINI|           M20|   Axial Ratio|     AUTOCLASS|   CLASSWEIGHT|'], dtype='object').

我的主要目标是单独打印该表的列，然后打印

cosmos['RA']

。有人知道如何做到这一点吗？

Answer 1

您的文件有四个标题行，标题 (

) 和数据（空格）中有不同的分隔符。您可以使用

skiprows

的

read_table

参数读取数据。

import requests
import pandas as pd

filename = 'cosmos_morph_cassata_1.1.tbl'
url = 'https://irsa.ipac.caltech.edu/data/COSMOS/tables/morphology/' + filename
n_header = 4

## Download large file to disc, so we can reuse it...
table_file = requests.get(url)
open(filename, 'wb').write(table_file.content)


## Skip the first 4 header rows and use whitespace as delimiter
cosmos = pd.read_table(filename, skiprows=n_header, header=None, delim_whitespace=True)

## create header from first line of file
with open(filename) as f:
    header_line = f.readline()
    ## trim whitespaces and split by '|'
    header_columns = header_line.replace(' ', '').split('|')[1:-1]

cosmos.columns = header_columns

Answer 2

感谢您的提问和回答。

我遇到了同样的问题，这很奇怪，因为来自同一数据库的其他 .tbl 文件（如 cosmos_acs_iphot_200709.tbl）可以很容易地用 TopCat 读取。

形态表有问题。

干杯

在python中读取在线.tbl数据文件

问题描述投票：0回答：2

2个回答

最新问题

在python中读取在线.tbl数据文件

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2