我需要将大型ICRS数据库(大约10亿个数据)转换为以半乳糖为中心的坐标。首先,我尝试将数据转换为coord.ICRS,然后在迭代循环中将其转换为coord.Galactocentric。但这非常耗时。到处搜索,我found在coord.Skycoord中,您可以使用数据数组进行转换。所以我在我的代码上实现了解决方案:
data = pd.read_csv('/content/data (1).csv')
data_ra = data['ra']
data_dec = data['dec']
data_dist = data['r_est']
data_ra = data_ra * u.degree
data_dec = data_dec * u.degree
data_dist = data_dist * u.pc
c = coord.ICRS(data_ra, data_dec, data_dist)
c = c.transform_to(coord.Galactocentric)
x = c.x.value
y = c.y.value
z = c.z.value
它返回错误代码:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-26-e02bbc9ec5dd> in <module>()
6 data_dec = data_dec * u.degree
7 data_dist = data_dist * u.pc
----> 8 c = coord.ICRS(data_ra, data_dec, data_dist)
9 c = c.transform_to(coord.Galactocentric)
10 x = c.x.value
5 frames
/usr/local/lib/python3.6/dist-packages/astropy/units/quantity.py in __new__(cls, value, unit, dtype, copy, order, subok, ndmin)
340 # Convert all quantities to the same unit.
341 if unit is None:
--> 342 unit = value[0].unit
343 value = [q.to_value(unit) for q in value]
344 value_unit = unit # signal below that conversion has been done
AttributeError: 'numpy.float64' object has no attribute 'unit'
我似乎无法解决问题,coord.ICRS不兼容数组吗?如果是这样,我如何才能加快转型过程。
这也让我有些惊讶。但是,原因是,当您访问Pandas DataFrame
的列时,它不会返回简单的Numpy数组,而是返回Pandas Series
对象(我使用一些虚拟数据对此进行了测试):
>>> data_ra = data['ra']
>>> type(data_ra)
<class 'pandas.core.series.Series'>
[似乎(对我来说,这有点麻烦),尽管您可以将Series
乘以一个单位,但是它不能正常工作:
>>> data_ra = data_ra * u.degree
>>> type(data_ra)
<class 'pandas.core.series.Series'>
因此,它没有像您希望的那样获得熵Quantity
,而只是返回了Series
。实际上,Quantity
是仍在其中,在Series
'.value
属性中:
>>> data_ra.values
<Quantity [ 1., 2., 3.] deg>
但是,在这种情况下,其他情况已损坏。首先创建Quantity
更好的方法是在每个.values
上使用Series
属性-这将返回一个简单的Numpy数组,可以将其转换为Quantity
:
>>> data_ra = df['ra'].values * u.degree
>>> data_dec = data['dec'].values * u.degree
>>> data_dist = data['dist'].values * u.pc
>>> c = coord.ICRS(data_ra, data_dec, data_dist)
>>> c
<ICRS Coordinate: (ra, dec, distance) in (deg, deg, pc)
[( 1., 4., 7.), ( 2., 5., 8.), ( 3., 6., 9.)]>
>>> c.transform_to(coord.Galactocentric)
<Galactocentric Coordinate (galcen_coord=<ICRS Coordinate: (ra, dec) in deg
( 266.4051, -28.936175)>, galcen_distance=8.3 kpc, galcen_v_sun=( 11.1, 232.24, 7.25) km / s, z_sun=27.0 pc, roll=0.0 deg): (x, y, z) in pc
[(-8300.70096432, 3.76036129, 21.14296691),
(-8300.99504334, 4.33255373, 20.35548782),
(-8301.33502602, 4.91092559, 19.5850604 )]>
[更好,除非您需要使用Pandas进行其他操作,否则还可以使用Astropy读取CSV文件并以所需的尺寸返回Table
到Quantity
s。例如,
>>> t = Table.read('foo.csv')
>>> for col, unit in [('ra', u.degree), ('dec', u.degree), ('dist', u.pc)]:
... t[col].unit = unit
...
>>> t
<Table length=3>
col0 dec dist ra
deg pc deg
int64 float64 float64 float64
----- ------- ------- -------
0 4.0 7.0 1.0
1 5.0 8.0 2.0
2 6.0 9.0 3.0
>>> coord.ICRS(t['ra'], t['dec'], t['dist'])
<ICRS Coordinate: (ra, dec, distance) in (deg, deg, pc)
[( 1., 4., 7.), ( 2., 5., 8.), ( 3., 6., 9.)]>
有点遗憾,我无法找到直接在Table.read()
调用中指定列尺寸的方法。这可能是对API的很好补充。
但是,如果将Table
写回ECSV文件,它将以类似于CSV的格式保存,但包含用于在读取时重建表格的其他元数据,包括设置单位:
>>> t.write('foo.ecsv')
>>> Table.read('foo.ecsv')
<Table length=3>
col0 dec dist ra
deg pc deg
int64 float64 float64 float64
----- ------- ------- -------
0 4.0 7.0 1.0
1 5.0 8.0 2.0
2 6.0 9.0 3.0
>>> print(open('foo.ecsv').read())
# %ECSV 0.9
# ---
# datatype:
# - {name: col0, datatype: int64}
# - {name: dec, unit: deg, datatype: float64}
# - {name: dist, unit: pc, datatype: float64}
# - {name: ra, unit: deg, datatype: float64}
# schema: astropy-2.0
col0 dec dist ra
0 4.0 7.0 1.0
1 5.0 8.0 2.0
2 6.0 9.0 3.0