我有 2 个天文目录,其中包含星系及其各自的天空坐标(赤经、赤纬)。我将目录作为数据框处理。这些目录来自不同的观测调查,并且有一些星系出现在两个目录中。我想交叉匹配这些星系并将它们放入一个新的目录中。我怎样才能用 python 做到这一点?我教过 numpy、pandas、astropy 或其他包应该有一些简单的方法,但我找不到解决方案?谢谢
经过大量研究,我发现最简单的方法是使用一个名为
astroml
的包,这里有一个 教程。我用过的笔记本称为 cross_math_data_and_colour_cuts_.ipynb
和 PS_data_cleaning_and_processing.ipynb
。
from astroML.crossmatch import crossmatch_angular
# if you are using google colab use first the line "!pip install astroml"
df_1 = pd.read_csv('catalog_1.csv')
df_2 = pd.read_csv('catalog_2.csv')
# crossmatch catalogs
max_radius = 1. / 3600 # 1 arcsec
# note, that for the below to work the first 2 columns of the catalogs should be ra, dec
# also, df_1 should be the longer of the 2 catalogs, else there will be index errors
dist, ind = crossmatch_angular(df_1.values, df_2.values, max_radius)
match = ~np.isinf(dist)
# THE DESIRED SOLUTION IS THEN:
df_crossed = df_1[match]
# ALTERNATIVELY:
# ind contains the indices of the cross-matched galaxies in respect to the second catalog,
# when there is no match it the kind value is the length of the first catalog
# so if you necessarily have to work with the indices of the second catalog, instead of the first, do:
df_2['new_var'] = [df_2.old_var[i] if i<len(df_2) else -999 for i in mind]
# that way whenever you have a match 'new_var' will contain the correct value from 'old_var'
# and whenever you have a mismatch it will contain -999 as a flag
如果一个人处于方便的位置,不仅在两个数据帧中拥有坐标,而且还拥有匹配源的 ID,那么就可以轻松地与 pandas .merge() 函数进行交叉匹配。假设我们在
df_1
中有 'ID', 'ra', 'dec', 'object_class'
列,在 df_2
中有 'ID', 'ra', 'dec', 'r_mag'
,那么我们可以与 进行交叉匹配
df_crossed = pd.merge(df_1, df_2, on='ID')
默认情况下,这将进行
inner
交叉匹配(请参阅此处了解更多详细信息)。生成的 df_crossed
将具有列 'ID', 'ra', 'dec', 'object_class', 'r_mag'
。