如何通过(笛卡尔)坐标与python 2数据帧交叉匹配?

问题描述 投票:0回答:1

我有 2 个天文目录,其中包含星系及其各自的天空坐标(赤经、赤纬)。我将目录作为数据框处理。这些目录来自不同的观测调查,并且有一些星系出现在两个目录中。我想交叉匹配这些星系并将它们放入一个新的目录中。我怎样才能用 python 做到这一点?我教过 numpy、pandas、astropy 或其他包应该有一些简单的方法,但我找不到解决方案?谢谢

python pandas database astropy cross-match
1个回答
0
投票

经过大量研究,我发现最简单的方法是使用一个名为

astroml
的包,这里有一个 教程。我用过的笔记本称为
cross_math_data_and_colour_cuts_.ipynb
PS_data_cleaning_and_processing.ipynb

from astroML.crossmatch import crossmatch_angular
# if you are using google colab use first the line "!pip install astroml"

df_1 = pd.read_csv('catalog_1.csv')
df_2 = pd.read_csv('catalog_2.csv')

# crossmatch catalogs
max_radius = 1. / 3600  # 1 arcsec
# note, that for the below to work the first 2 columns of the catalogs should be ra, dec
# also, df_1 should be the longer of the 2 catalogs, else there will be index errors
dist, ind = crossmatch_angular(df_1.values, df_2.values, max_radius)
match = ~np.isinf(dist)
# THE DESIRED SOLUTION IS THEN:
df_crossed = df_1[match]


# ALTERNATIVELY:
# ind contains the indices of the cross-matched galaxies in respect to the second catalog,
# when there is no match it the kind value is the length of the first catalog
# so if you necessarily have to work with the indices of the second catalog, instead of the first, do:
df_2['new_var'] = [df_2.old_var[i] if i<len(df_2) else -999 for i in mind]
# that way whenever you have a match 'new_var' will contain the correct value from 'old_var'
# and whenever you have a mismatch it will contain -999 as a flag

如果一个人处于方便的位置,不仅在两个数据帧中拥有坐标,而且还拥有匹配源的 ID,那么就可以轻松地与 pandas .merge() 函数进行交叉匹配。假设我们在

df_1
中有
'ID', 'ra', 'dec', 'object_class'
列,在
df_2
中有
'ID', 'ra', 'dec', 'r_mag'
,那么我们可以与

进行交叉匹配
df_crossed = pd.merge(df_1, df_2, on='ID')

默认情况下,这将进行

inner
交叉匹配(请参阅此处了解更多详细信息)。生成的
df_crossed
将具有列
'ID', 'ra', 'dec', 'object_class', 'r_mag'

© www.soinside.com 2019 - 2024. All rights reserved.