我无法在 4D 的 xarray 数据集(用 dask 分段)上计算 Parcel_profile(1D 函数)。
你好,
我真的需要帮助,我正在研究有关压力水平的 ERA5 每小时数据。我提取了几个大气压水平下的相对湿度和温度。我使用 Metpy 的“dewpoint_from_relative_humidity”函数来计算露点。该表相当大,因此我使用 dask 创建多个块。我有下表:
import numpy as np
import metpy as mp
import metpy.calc as mpcalc
import xarray as xr
import metpy.units as mpunit
with dask.config.set(**{'array.slicing.split_large_chunks': True}):
ncin_1 = xr.open_dataset(ncfname_1, engine='netcdf4').chunk('auto')
ncin_1['t'] = ncin_1['t'] - 273.15
ncin_1['r'] = ncin_1['r'] / 100.0
ncin_1['r'] = ncin_1['r'].clip(min=0.01, max=1)
# variable reprocessing with metpy unit registry
ncin_1['level'].attrs['units'] = 'hPa'
ncin_1 = ncin_1.metpy.quantify()
ncin_1['time'] = ncin_1['time'].metpy.time
ncin_1['latitude'] = ncin_1['latitude'].metpy.latitude
ncin_1['longitude'] = ncin_1['longitude'].metpy.longitude
ncin_1['level'] = ncin_1['level'].metpy.vertical
ncin_1['t'] = ncin_1['t'] * mpunit.units.degC
ncin_1['dewpoint'] = ncin_1['dewpoint'] * mpunit.units.degC
ncin_1['r'] = ncin_1['r'] * mpunit.units.dimensionless
# pressures levels in descending order
ncin_1= ncin_1.where((ncin_1['level'] >= 100) & (test['level'] <= 1000), drop=True)
ncin_1= ncin_1.sortby('level', ascending=False)
# create a profile variable to be filled in
ncin_1['profil'] = (('time', 'level', 'latitude', 'longitude'),np.full_like(ncin_1['t'], fill_value=np.nan))
<xarray.Dataset>
Dimensions: (longitude: 60, latitude: 41, level: 21, time: 24836)
Coordinates:
* longitude (longitude) float32 -5.02 -4.77 -4.52 -4.27 ... 9.23 9.48 9.73
* latitude (latitude) float32 51.15 50.9 50.65 50.4 ... 41.65 41.4 41.15
* level (level) int32 20 50 100 150 200 250 ... 750 800 850 900 950 1000
* time (time) datetime64[ns] 2006-01-01 ... 2022-12-31T18:00:00
Data variables:
r (time, level, latitude, longitude) float32 0.03323 ... 0.7702
t (time, level, latitude, longitude) float32 -69.07 ... 14.25
dewpoint (time, level, latitude, longitude) float32 -90.23 ... 10.28
profil (time, level, latitude, longitude) float32 nan nan ... nan nan
Attributes:
Conventions: CF-1.6
history: 2023-07-19 22:32:41 GMT by grib_to_netcdf-2.25.1: /opt/ecmw...
我想计算提升索引(使用metpy的lifted_index函数),但首先我需要计算parcel_profile变量。根据文档,问题在于该函数是一维函数。我使用 xarray.apply_ufunc 或 xarray.map_blocks 制作了几个脚本。
def wrapper_parcel_profile(pressure, temperature, dewpoint):
return mpcalc.parcel_profile(pressure * units.hPa , temperature * units.degC , dewpoint * units.degC ).to('degC')
t_1000 = ncin_1['t'].metpy.sel(level=1000)
dewpoint_1000 = ncin_1['dewpoint'].metpy.sel(level=1000)
pressure = ncin_1['level']
ncin_1['profil'] = xr.apply_ufunc(
wrapper_parcel_profile,
pressure, t_1000, dewpoint_1000,
input_core_dims=[['level'], ['time', 'latitude', 'longitude'], ['time', 'latitude', 'longitude']],
output_core_dims=[['time', 'level' , 'latitude', 'longitude']],
vectorize=True,
dask='parallelized',
output_dtypes=[float],
dask_gufunc_kwargs={'allow_rechunk': True}
)
这个脚本运行是因为我使用的是dask,所以如果我理解正确的话,只要我不运行ncin_1.compute()命令,就不会直接计算任何东西。我收到此错误消息:
ValueError: operands could not be broadcast together with shapes (19,) (24836,41,60)
(一定和ncin_1['level']有关系?)
def wrapper_parcel_profile(pressure, temperature, dewpoint):
return mpcalc.parcel_profile( pressure * units.hPa , temperature * units.degC , dewpoint * units.degC).to('degC')
pressure = test['level']
t_1000 = test['t'].metpy.sel(level=1000) * units.degC
dewpoint_1000 = test['dewpoint'].metpy.sel(level=1000) * units.degC
test['profil'] = xr.map_blocks(wrapper_parcel_profile, test ,template= test['t'])
当我在数据库中使用“ncin_1.compute()”时收到此错误消息:
TypeError: Mixing chunked array types is not supported, but received multiple types: {, }
我的方法正确吗?是否可以简单地通过保留在 xarray 数据集中来做到这一点?我找到的解决方案合适吗?预先感谢您的帮助