我正在尝试使用 erddapy 包从 ERDDAP 数据服务器检索数据,这是我尝试在 jupyter 笔记本中执行的代码:
from erddapy import ERDDAP
import pandas as pd
import xarray as xr
import matplotlib.pyplot as plt
import numpy as np
def download_oisst_data(start_date, end_date, min_lat, max_lat, min_lon, max_lon):
e = ERDDAP(
server="https://coastwatch.pfeg.noaa.gov/erddap/",
protocol="tabledap",
)
e.dataset_id = "ncdcOisst21Agg_LonPM180" # Correctly set the dataset ID
e.variables = ["time", "latitude", "longitude", "sst"]
e.constraints = {
"time>=": start_date,
"time<=": end_date,
"latitude>=": min_lat,
"latitude<=": max_lat,
"longitude>=": min_lon,
"longitude<=": max_lon,
}
# Fetch the data and convert it to a pandas DataFrame
df = e.to_pandas(
index_col="time (UTC)",
parse_dates=True,
skiprows=(1,) # Skip the units row
).dropna()
# Convert the DataFrame to an xarray Dataset
ds = df.to_xarray()
return ds
ds = download_oisst_data(start_date, end_date, min_lat, max_lat, min_lon, max_lon)
此代码返回以下错误:
---------------------------------------------------------------------------
HTTPStatusError Traceback (most recent call last)
File ~/miniconda3/lib/python3.10/site-packages/erddapy/core/url.py:24, in _urlopen(url, auth, **kwargs)
23 try:
---> 24 response.raise_for_status()
25 except httpx.HTTPError as err:
File ~/miniconda3/lib/python3.10/site-packages/httpx/_models.py:761, in Response.raise_for_status(self)
760 message = message.format(self, error_type=error_type)
--> 761 raise HTTPStatusError(message, request=request, response=self)
HTTPStatusError: Client error '404 ' for url 'https://coastwatch.pfeg.noaa.gov/erddap/tabledap/ncdcOisst21Agg_LonPM180.csvp?time,latitude,longitude,sst&time%3E=368150400.0&time%3C=1704067199.0&latitude%3E=-40&latitude%3C=30&longitude%3E=30&longitude%3C=100'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/404
The above exception was the direct cause of the following exception:
HTTPError Traceback (most recent call last)
Cell In[24], line 1
----> 1 ds = download_oisst_data(start_date, end_date, min_lat, max_lat, min_lon, max_lon)
Cell In[22], line 18, in download_oisst_data(start_date, end_date, min_lat, max_lat, min_lon, max_lon)
8 e.constraints = {
9 "time>=": start_date,
10 "time<=": end_date,
(...)
14 "longitude<=": max_lon,
15 }
17 # Fetch the data and convert it to a pandas DataFrame
---> 18 df = e.to_pandas(
19 index_col="time (UTC)",
20 parse_dates=True,
21 skiprows=(1,) # Skip the units row
22 ).dropna()
24 # Convert the DataFrame to an xarray Dataset, if needed
25 # This step requires importing xarray and possibly additional processing depending on the data structure
26 ds = df.to_xarray() # Uncomment this line if you have the necessary setup for converting DataFrame to xarray Dataset
File ~/miniconda3/lib/python3.10/site-packages/erddapy/erddapy.py:361, in ERDDAP.to_pandas(self, requests_kwargs, **kw)
359 distinct = kw.pop("distinct", False)
360 url = self.get_download_url(response=response, distinct=distinct)
--> 361 return to_pandas(url, requests_kwargs=requests_kwargs, pandas_kwargs=dict(**kw))
File ~/miniconda3/lib/python3.10/site-packages/erddapy/core/interfaces.py:31, in to_pandas(url, requests_kwargs, pandas_kwargs)
19 def to_pandas(
20 url: str,
21 requests_kwargs: Optional[Dict] = None,
22 pandas_kwargs: Optional[Dict] = None,
23 ) -> "pd.DataFrame":
24 """
25 Convert a URL to Pandas DataFrame.
26
(...)
29 **pandas_kwargs: kwargs to be passed to third-party library (pandas).
30 """
---> 31 data = urlopen(url, requests_kwargs or {})
32 try:
33 return pd.read_csv(data, **(pandas_kwargs or {}))
File ~/miniconda3/lib/python3.10/site-packages/erddapy/core/url.py:42, in urlopen(url, requests_kwargs)
40 if requests_kwargs is None:
41 requests_kwargs = {}
---> 42 data = _urlopen(url, **requests_kwargs) # type: ignore
43 data.seek(0)
44 return data
File ~/miniconda3/lib/python3.10/site-packages/erddapy/core/url.py:26, in _urlopen(url, auth, **kwargs)
24 response.raise_for_status()
25 except httpx.HTTPError as err:
---> 26 raise httpx.HTTPError(f"{response.content.decode()}") from err
27 return io.BytesIO(response.content)
HTTPError: Error {
code=404;
message="Not Found: Currently unknown datasetID=ncdcOisst21Agg_LonPM180";
}
此处,数据集抛出未知 datasetID 错误:“ncdcOisst21Agg_LonPM180”。然而,在访问 url="https://coastwatch.pfeg.noaa.gov/erddap/" 并输入“sst”作为搜索词后,我发现数据集 ID 确实存在。 .
我使用的是配备 Intel i5 的 MacBook Air 2020 版,请告诉我应该如何解决此错误。
您似乎使用了错误的协议设置来与该数据集兼容。如果显示
HTTPStatusError: Client error '404 ' for url
,请注意您看到的基本 URL 是:
https://coastwatch.pfeg.noaa.gov/erddap/tabledap/ncdcOisst21Agg_LonPM180.csvp
如果您随后访问 https://coastwatch.pfeg.noaa.gov/erddap/tabledap/(解析为
https://coastwatch.pfeg.noaa.gov/erddap/tabledap/index.html?page=1&itemsPerPage=1000
),然后查看列表,您会发现您要找的不是列在其中 290 名中。
如果您转到主页并查看,您会看到您尝试过的协议上方的协议是
griddap
。如果您点击这些数据集,您就可以查找您的数据集。
我通过使用“高级搜索”来缩小范围,看到您的列表在那里。您可以自己查看这里。 注意该搜索页面上的协议条目。
因此尝试将
protocol
行更改为:
protocol="griddap",