我有一个基于卫星的太阳诱发荧光(SIF)数据集和一个模拟降水量的数据集。我想将研究区域中每个像素的降水量与SIF进行比较。我的两个数据集的面积相同,但空间分辨率略有不同。当我取整个区域的平均值时,我可以成功绘制这些值的时间并相互比较,但是我正在努力创建每个像素的散点图。
老实说,当不确定沉淀对SIF的影响时,我不确定这是否是比较这两个值的最佳方法,因此我愿意接受不同方法的想法。至于当前合并数据,我正在使用xr.combine_by_coords
,但这给了我下面描述的错误。我也可以通过将netcdfs转换为geotiffs,然后使用rasterio使其变形来完成此操作,但这似乎是进行这种比较的一种无效方法。这是我到目前为止的内容:
xr.combine_by_coords
目前,我在最后一行收到一个看似无济于事的import netCDF4
import numpy as np
import dask
import xarray as xr
rainy_bbox = np.array([
[-69.29519955115512,-13.861261028444734],
[-69.29519955115512,-12.384786628185896],
[-71.19583431678012,-12.384786628185896],
[-71.19583431678012,-13.861261028444734]])
max_lon_lat = np.max(rainy_bbox, axis=0)
min_lon_lat = np.min(rainy_bbox, axis=0)
# this dataset is available here: ftp://fluo.gps.caltech.edu/data/tropomi/gridded/
sif = xr.open_dataset('../data/TROPO_SIF_03-2018.nc')
# the dataset is global so subset to my study area in the Amazon
rainy_sif_xds = sif.sel(lon=slice(min_lon_lat[0], max_lon_lat[0]), lat=slice(min_lon_lat[1], max_lon_lat[1]))
# this data can all be downloaded from NASA Goddard here either manually or with wget but you'll need an account on https://disc.gsfc.nasa.gov/: https://pastebin.com/viZckVdn
imerg_xds = xr.open_mfdataset('../data/3B-DAY.MS.MRG.3IMERG.201803*.nc4')
# spatial subset
rainy_imerg_xds = imerg_xds.sel(lon=slice(min_lon_lat[0], max_lon_lat[0]), lat=slice(min_lon_lat[1], max_lon_lat[1]))
# I'm not sure the best way to combine these datasets but am trying this
combo_xds = xr.combine_by_coords([rainy_imerg_xds, rainy_xds])
。当我添加参数RecursionError: maximum recursion depth exceeded in comparison
时,来自join='left'
数据集的数据将位于rainy_imerg_xds
中,当我执行combo_xds
时,将出现join='right'
数据,如果执行rainy_xds
,则将不存在任何数据。我认为此功能有一些内部插值,但似乎没有。
此join='inner'
非常简单地概述了该问题的解决方案。 documentation from xarray允许您内插多个尺寸,并指定另一个数据集的x和y尺寸作为输出尺寸。因此,在这种情况下,可以通过
xarray
# interpolation based on http://xarray.pydata.org/en/stable/interpolation.html
# interpolation can't be done across the chunked dimension so we have to load it all into memory
rainy_sif_xds.load()
#interpolate into the higher resolution grid from IMERG
interp_rainy_sif_xds = rainy_sif_xds.interp(lat=rainy_imerg_xds["lat"], lon=rainy_imerg_xds["lon"])
# visualize the output
rainy_sif_xds.dcSIF.mean(dim='time').hvplot.quadmesh('lon', 'lat', cmap='jet', geo=True, rasterize=True, dynamic=False, width=450).relabel('Initial') +\
interp_rainy_sif_xds.dcSIF.mean(dim='time').hvplot.quadmesh('lon', 'lat', cmap='jet', geo=True, rasterize=True, dynamic=False, width=450).relabel('Interpolated')
# now that our coordinates match, in order to actually merge we need to convert the default CFTimeIndex to datetime to merge dataset with SIF data because the IMERG rainfall dataset was CFTime and the SIF was datetime
rainy_imerg_xds['time'] = rainy_imerg_xds.indexes['time'].to_datetimeindex()
# now the merge can easily be done with
merged_xds = xr.combine_by_coords([rainy_imerg_xds, interp_rainy_sif_xds], coords=['lat', 'lon', 'time'], join="inner")
# now visualize the two datasets together // multiply SIF by 30 because values are so ow
merged_xds.HQprecipitation.rolling(time=7, center=True).sum().mean(dim=('lat', 'lon')).hvplot().relabel('Precip') * \
(merged_xds.dcSIF.mean(dim=('lat', 'lon'))*30).hvplot().relabel('SIF')