BioPython:KEGG REST 不断报告 HTTP 错误 403:禁止

问题描述 投票:0回答:2

我正在尝试使用 Bio.KEGG 中的 BioPython 的 REST 模块来查询 KEGG 数据库,以获取某些化合物的名称和分子式,使用化合物化学识别号 (CID),例如C0001 是水,C00123 是亮氨酸等:

from Bio.KEGG import REST
from Bio.KEGG import Compound


def cpd_decoder(cid): #gets the compound name and formula from KEGG
    if "C" in cid:
        cid="cpd:"+cid
        kegg_entry=REST.kegg_get(cid)
        for record in Compound.parse(kegg_entry):
            cid_name=record.name[0]
            cid_formula=record.formula 
            return cid_name,cid_formula

cid="C00123" #example CID; this one's for leucine
if cpd_decoder(cid) !=None:
    compound,formula=cpd_decoder(cid)

然而,尽管 BioPython 使用 KEGG 自己的 API,我几乎总是收到以下错误:

    if cpd_decoder(cid) !=None:
  File "/media/tessa/Storage/Alien_Earths/Network_expansion/network expansion test 2.py", line 27, in cpd_decoder
    kegg_entry=REST.kegg_get(cid)
  File "/home/tessa/.local/lib/python3.10/site-packages/Bio/KEGG/REST.py", line 208, in kegg_get
    resp = _q("get", dbentries)
  File "/home/tessa/.local/lib/python3.10/site-packages/Bio/KEGG/REST.py", line 44, in _q
    resp = urlopen(URL % (args))
  File "/usr/lib/python3.10/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.10/urllib/request.py", line 525, in open
    response = meth(req, response)
  File "/usr/lib/python3.10/urllib/request.py", line 634, in http_response
    response = self.parent.error(
  File "/usr/lib/python3.10/urllib/request.py", line 563, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.10/urllib/request.py", line 496, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.10/urllib/request.py", line 643, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden 

我想知道是否因为我正在处理大量 CID,KEGG 现在认为我是机器人并阻止了我。有办法解决这个问题吗?

python bioinformatics biopython
2个回答
0
投票

从今天起我使用与您的脚本非常相似的脚本得到了同样的结果。一个月前,当我运行相同的脚本时,这种情况没有发生。它本质上是通过约 250 个 KO 编号的列表来获取与其关联的反应 ID,然后检索每个反应的反应化学计量以自动生成反应矩阵。

我发现在第 240 次 KO 之前一切都很好,但随后我开始收到“403 禁止”错误。当我在浏览器中手动输入该 URL 时,它仍然存在,而当进入另一个网络时,它就消失了。然后我重试,得到了相同的结果。所以看起来 KEGG 最近开始禁止做类似事情的用户。


0
投票

我找到了修复方法,它大大加快了代码速度。我之前没有使用过Biopython,而是使用了python中的requests包。也许你可以在 Biopython 中做同样的事情。

您可以将所有 KO 编号放入单个请求中,而不是分别为每个 KO 编号(或在您的情况下为复合 ID)发出连接请求。 所以,与其请求:

https://rest.kegg.jp/link/reaction/ko:K00012

https://rest.kegg.jp/link/reaction/ko:K12450

等等..

你可以这样做:

https://rest.kegg.jp/link/reaction/ko:K00012+K12450+

这也运行得更快,因为你只需要等待 KEGG 响应一次。然后你只需要解析结果(可能 Biopython 已经可以做到)

这是我的代码:

import requests

#Replace by your own query
KO_numbers = ["K00012", "K12450", "K21379"]

#Define the start of the URL, replace with the URL for your own need
url = "https://rest.kegg.jp/link/reaction/ko:"

#For each KO number in the list: add it to the URL, and put a "+" in between
for KO in KO_numbers:
    url += KO
    url +=  "+"

#Do the actual request, raise an error if something is wrong
response = requests.get(url)
if response.status_code != 200:
     raise ConnectionError("Cannot connect to KEGG API") 

#Here I just print the response, but from here you need to parse it to do what you want to do with the data
print(response.text)
© www.soinside.com 2019 - 2024. All rights reserved.