尝试使用 erddapy 访问数据集“ncdcOisst21Agg_LonPM180”会出现错误“未找到:当前未知的 datasetID=ncdcOisst21Agg_LonPM180”

问题描述 投票:0回答:1

我正在尝试使用 erddapy 包从 ERDDAP 数据服务器检索数据,这是我尝试在 jupyter 笔记本中执行的代码:

from erddapy import ERDDAP
import pandas as pd
import xarray as xr
import matplotlib.pyplot as plt
import numpy as np

def download_oisst_data(start_date, end_date, min_lat, max_lat, min_lon, max_lon):
    e = ERDDAP(
        server="https://coastwatch.pfeg.noaa.gov/erddap/",
        protocol="tabledap",
    )
    e.dataset_id = "ncdcOisst21Agg_LonPM180"  # Correctly set the dataset ID
    e.variables = ["time", "latitude", "longitude", "sst"]
    e.constraints = {
        "time>=": start_date,
        "time<=": end_date,
        "latitude>=": min_lat,
        "latitude<=": max_lat,
        "longitude>=": min_lon,
        "longitude<=": max_lon,
    }

    # Fetch the data and convert it to a pandas DataFrame
    df = e.to_pandas(
        index_col="time (UTC)",
        parse_dates=True,
        skiprows=(1,)  # Skip the units row
    ).dropna()

    # Convert the DataFrame to an xarray Dataset
    ds = df.to_xarray()  
    
    return ds

ds = download_oisst_data(start_date, end_date, min_lat, max_lat, min_lon, max_lon)

此代码返回以下错误:

---------------------------------------------------------------------------
HTTPStatusError                           Traceback (most recent call last)
File ~/miniconda3/lib/python3.10/site-packages/erddapy/core/url.py:24, in _urlopen(url, auth, **kwargs)
     23 try:
---> 24     response.raise_for_status()
     25 except httpx.HTTPError as err:

File ~/miniconda3/lib/python3.10/site-packages/httpx/_models.py:761, in Response.raise_for_status(self)
    760 message = message.format(self, error_type=error_type)
--> 761 raise HTTPStatusError(message, request=request, response=self)

HTTPStatusError: Client error '404 ' for url 'https://coastwatch.pfeg.noaa.gov/erddap/tabledap/ncdcOisst21Agg_LonPM180.csvp?time,latitude,longitude,sst&time%3E=368150400.0&time%3C=1704067199.0&latitude%3E=-40&latitude%3C=30&longitude%3E=30&longitude%3C=100'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/404

The above exception was the direct cause of the following exception:

HTTPError                                 Traceback (most recent call last)
Cell In[24], line 1
----> 1 ds = download_oisst_data(start_date, end_date, min_lat, max_lat, min_lon, max_lon)

Cell In[22], line 18, in download_oisst_data(start_date, end_date, min_lat, max_lat, min_lon, max_lon)
      8 e.constraints = {
      9     "time>=": start_date,
     10     "time<=": end_date,
   (...)
     14     "longitude<=": max_lon,
     15 }
     17 # Fetch the data and convert it to a pandas DataFrame
---> 18 df = e.to_pandas(
     19     index_col="time (UTC)",
     20     parse_dates=True,
     21     skiprows=(1,)  # Skip the units row
     22 ).dropna()
     24 # Convert the DataFrame to an xarray Dataset, if needed
     25 # This step requires importing xarray and possibly additional processing depending on the data structure
     26 ds = df.to_xarray()  # Uncomment this line if you have the necessary setup for converting DataFrame to xarray Dataset

File ~/miniconda3/lib/python3.10/site-packages/erddapy/erddapy.py:361, in ERDDAP.to_pandas(self, requests_kwargs, **kw)
    359 distinct = kw.pop("distinct", False)
    360 url = self.get_download_url(response=response, distinct=distinct)
--> 361 return to_pandas(url, requests_kwargs=requests_kwargs, pandas_kwargs=dict(**kw))

File ~/miniconda3/lib/python3.10/site-packages/erddapy/core/interfaces.py:31, in to_pandas(url, requests_kwargs, pandas_kwargs)
     19 def to_pandas(
     20     url: str,
     21     requests_kwargs: Optional[Dict] = None,
     22     pandas_kwargs: Optional[Dict] = None,
     23 ) -> "pd.DataFrame":
     24     """
     25     Convert a URL to Pandas DataFrame.
     26 
   (...)
     29     **pandas_kwargs: kwargs to be passed to third-party library (pandas).
     30     """
---> 31     data = urlopen(url, requests_kwargs or {})
     32     try:
     33         return pd.read_csv(data, **(pandas_kwargs or {}))

File ~/miniconda3/lib/python3.10/site-packages/erddapy/core/url.py:42, in urlopen(url, requests_kwargs)
     40 if requests_kwargs is None:
     41     requests_kwargs = {}
---> 42 data = _urlopen(url, **requests_kwargs)  # type: ignore
     43 data.seek(0)
     44 return data

File ~/miniconda3/lib/python3.10/site-packages/erddapy/core/url.py:26, in _urlopen(url, auth, **kwargs)
     24     response.raise_for_status()
     25 except httpx.HTTPError as err:
---> 26     raise httpx.HTTPError(f"{response.content.decode()}") from err
     27 return io.BytesIO(response.content)

HTTPError: Error {
    code=404;
    message="Not Found: Currently unknown datasetID=ncdcOisst21Agg_LonPM180";
}

此处,数据集抛出未知 datasetID 错误:“ncdcOisst21Agg_LonPM180”。然而,在访问 url="https://coastwatch.pfeg.noaa.gov/erddap/" 并输入“sst”作为搜索词后,我发现数据集 ID 确实存在。datasetID displayed in search results once the site is visited .

我使用的是配备 Intel i5 的 MacBook Air 2020 版,请告诉我应该如何解决此错误。

python jupyter-notebook noaa
1个回答
0
投票

您似乎使用了错误的协议设置来与该数据集兼容。如果显示

HTTPStatusError: Client error '404 ' for url 
,请注意您看到的基本 URL 是:

https://coastwatch.pfeg.noaa.gov/erddap/tabledap/ncdcOisst21Agg_LonPM180.csvp

如果您随后访问 https://coastwatch.pfeg.noaa.gov/erddap/tabledap/(解析为

https://coastwatch.pfeg.noaa.gov/erddap/tabledap/index.html?page=1&itemsPerPage=1000
),然后查看列表,您会发现您要找的不是列在其中 290 名中。

如果您转到主页并查看,您会看到您尝试过的协议上方的协议是

griddap
。如果您点击这些数据集,您就可以查找您的数据集。

我通过使用“高级搜索”来缩小范围,看到您的列表在那里。您可以自己查看这里注意该搜索页面上的协议条目。

因此尝试将

protocol
行更改为:

protocol="griddap",
© www.soinside.com 2019 - 2024. All rights reserved.