我对端点的行为以及请求的处理感到非常困惑。基本的 RDFS 命名空间在查询时似乎与另一个定义发生冲突,导致声明前缀时出错,而在正文中省略前缀时导致正常输出。
SELECT *
WHERE {
?sub rdfs:label ?p .
} LIMIT 5
输出1:
INFO:root: sub p
0 http://example.org/triples/17bbab96 Pont d Iéna-9423efbc
1 http://example.org/triples/37d3fba1 Pont d Iéna-9423efbc
2 http://example.org/triples/e8a8921a Pont Transbordeur-fb62b01e
3 http://example.org/triples/7907d1de Pont Transbordeur-fb62b01e
4 http://example.org/triples/5b529b5e Pont d Iéna-98cdd2fc
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT *
WHERE {
?sub rdfs:label ?p .
} LIMIT 5
输出2(客户端):
(...)
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request
(...)
ValueError: You did something wrong formulating either the URI or your SPARQL query
输出2(服务器端):
[INFO ] 2023-07-13 08:50:13,797 [repositories/astra1 | c.o.f.s.GraphDBProtocolExceptionResolver] X-Request-Id: 712a09f4-626e-5f2a-b22b-5d436e2c4ae2 Client sent bad request (400)
org.eclipse.rdf4j.http.server.ClientHTTPException: MALFORMED QUERY: Multiple prefix declarations for prefix 'rdfs'
import os
import logging
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
from dotenv import load_dotenv
load_dotenv()
from rdflib import Graph
from rdflib.plugins.stores import sparqlstore
from rdflib.plugins.sparql.processor import SPARQLResult
from requests.auth import HTTPDigestAuth
from pandas import DataFrame
def sparql_results_to_df(results: SPARQLResult) -> DataFrame:
"""
Export results from an rdflib SPARQL query into a `pandas.DataFrame`,
using Python types. See https://github.com/RDFLib/rdflib/issues/1179.
"""
return DataFrame(
data=([None if x is None else x.toPython() for x in row] for row in results),
columns=[str(x) for x in results.vars],
)
if __name__ == '__main__':
store = sparqlstore.SPARQLUpdateStore(query_endpoint=os.environ['SPARQL_ENDPOINT_QUERY'], update_endpoint=os.environ['SPARQL_ENDPOINT_UPDATE']) #,
# auth=HTTPDigestAuth(config.AUTH_USER, config.AUTH_PASS), context_aware=True,
g = Graph(store=store, identifier=os.environ['SPARQL_DEFAULT_NAMED_GRAPH_FULL_URI']) # namespace_manager=None
q_sa ="""
select * where {
?s ?p ?o .
} limit 20
"""
q_sa2 = """
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT *
WHERE {
?sub rdfs:label ?p .
} LIMIT 20
"""
qr = g.query(q_sa2)
df = sparql_results_to_df(qr)
logging.info(df)
我的预期恰恰相反,查询 1 在引发“未定义前缀错误”时失败,查询 2 检索我的结果。 有没有办法通过修改客户端或服务器端的某些内容来实现这种行为?这是一个坏主意吗 ? (我更喜欢查询中包含所有内容,甚至是最基本的命名空间)
我很高兴读到您对此的想法。 预先感谢您的回答!
谢谢@UninformedUser,你让我走上了正轨!很难弄清楚错误在哪里触发(rdflib 的图表? sparqlstore ? 端点配置?)
唉,空的
initNs
不起作用,因为它在源中被默认的图形命名空间覆盖:initNs = initNs or dict(self.namespaces()) # noqa: N806
查看 RDFLIB 文档中的命名空间绑定,每个图都附带默认命名空间。
然后,解决方案是覆盖默认的图形配置:
g = Graph(store=store, identifier=os.environ['SPARQL_DEFAULT_NAMED_GRAPH_FULL_URI'], bind_namespaces="none")
解决了! (2天后标记)