BioPython：KEGG REST 不断报告 HTTP 错误 403：禁止

Question

我正在尝试使用 Bio.KEGG 中的 BioPython 的 REST 模块来查询 KEGG 数据库，以获取某些化合物的名称和分子式，使用化合物化学识别号 (CID)，例如C0001 是水，C00123 是亮氨酸等：

from Bio.KEGG import REST
from Bio.KEGG import Compound


def cpd_decoder(cid): #gets the compound name and formula from KEGG
    if "C" in cid:
        cid="cpd:"+cid
        kegg_entry=REST.kegg_get(cid)
        for record in Compound.parse(kegg_entry):
            cid_name=record.name[0]
            cid_formula=record.formula 
            return cid_name,cid_formula

cid="C00123" #example CID; this one's for leucine
if cpd_decoder(cid) !=None:
    compound,formula=cpd_decoder(cid)

然而，尽管 BioPython 使用 KEGG 自己的 API，我几乎总是收到以下错误：

    if cpd_decoder(cid) !=None:
  File "/media/tessa/Storage/Alien_Earths/Network_expansion/network expansion test 2.py", line 27, in cpd_decoder
    kegg_entry=REST.kegg_get(cid)
  File "/home/tessa/.local/lib/python3.10/site-packages/Bio/KEGG/REST.py", line 208, in kegg_get
    resp = _q("get", dbentries)
  File "/home/tessa/.local/lib/python3.10/site-packages/Bio/KEGG/REST.py", line 44, in _q
    resp = urlopen(URL % (args))
  File "/usr/lib/python3.10/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.10/urllib/request.py", line 525, in open
    response = meth(req, response)
  File "/usr/lib/python3.10/urllib/request.py", line 634, in http_response
    response = self.parent.error(
  File "/usr/lib/python3.10/urllib/request.py", line 563, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.10/urllib/request.py", line 496, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.10/urllib/request.py", line 643, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

我想知道是否因为我正在处理大量 CID，KEGG 现在认为我是机器人并阻止了我。有办法解决这个问题吗？

Answer 1

从今天起我使用与您的脚本非常相似的脚本得到了同样的结果。一个月前，当我运行相同的脚本时，这种情况没有发生。它本质上是通过约 250 个 KO 编号的列表来获取与其关联的反应 ID，然后检索每个反应的反应化学计量以自动生成反应矩阵。

我发现在第 240 次 KO 之前一切都很好，但随后我开始收到“403 禁止”错误。当我在浏览器中手动输入该 URL 时，它仍然存在，而当进入另一个网络时，它就消失了。然后我重试，得到了相同的结果。所以看起来 KEGG 最近开始禁止做类似事情的用户。

Answer 2

我找到了修复方法，它大大加快了代码速度。我之前没有使用过Biopython，而是使用了python中的requests包。也许你可以在 Biopython 中做同样的事情。

您可以将所有 KO 编号放入单个请求中，而不是分别为每个 KO 编号（或在您的情况下为复合 ID）发出连接请求。所以，与其请求：

https://rest.kegg.jp/link/reaction/ko:K00012

https://rest.kegg.jp/link/reaction/ko:K12450

等等..

你可以这样做：

https://rest.kegg.jp/link/reaction/ko:K00012+K12450+

这也运行得更快，因为你只需要等待 KEGG 响应一次。然后你只需要解析结果（可能 Biopython 已经可以做到）

这是我的代码：

import requests

#Replace by your own query
KO_numbers = ["K00012", "K12450", "K21379"]

#Define the start of the URL, replace with the URL for your own need
url = "https://rest.kegg.jp/link/reaction/ko:"

#For each KO number in the list: add it to the URL, and put a "+" in between
for KO in KO_numbers:
    url += KO
    url +=  "+"

#Do the actual request, raise an error if something is wrong
response = requests.get(url)
if response.status_code != 200:
     raise ConnectionError("Cannot connect to KEGG API") 

#Here I just print the response, but from here you need to parse it to do what you want to do with the data
print(response.text)

BioPython：KEGG REST 不断报告 HTTP 错误 403：禁止

问题描述投票：0回答：2

2个回答

最新问题

BioPython：KEGG REST 不断报告 HTTP 错误 403：禁止

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2