计算SPARQL中的余弦相似度

问题描述 投票:0回答:1

我正在寻找一种使用 SPARQL 计算 余弦相似度 的方法。

RDF 数据中的向量描述如下:

@prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

<http://example.org/london> rdfs:label "London" ;
    rdf:_1 0.011788688 ;
    rdf:_2 0.006153286 ;
    rdf:_3 -0.0034582422 ;
    ...
    rdf:_1536 -0.020006698 .

<http://example.org/united-kingdom> rdfs:label "United Kingdom" ;
    rdf:_1 0.007484864 ;
    rdf:_2 -0.022806747 ;
    rdf:_3 -0.010839927 ;
    ...
    rdf:_1536 0.001866414 .

<http://example.org/united-states> rdfs:label "United States of America" ;
    rdf:_1 0.0070878486 ;
    rdf:_2 -0.02133514 ;
    rdf:_3 -0.000050822895 ;
    ...
    rdf:_1536 -0.012027864 .
sparql embedding cosine-similarity
1个回答
3
投票

我的查询如下所示:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX afn: <http://jena.apache.org/ARQ/function#>

SELECT ?embed1 ?embed2 ((SUM(?dot) / (afn:sqrt(SUM(?v1_squared)) * afn:sqrt(SUM(?v2_squared)))) AS ?similarity)
WHERE {
  ?embed1 ?p ?v1 .
  ?embed2 ?p ?v2 .
  FILTER (STRSTARTS(STR(?p), str(rdf:_)))

  BIND(?v1 * ?v1 AS ?v1_squared)
  BIND(?v2 * ?v2 AS ?v2_squared)
  BIND(?v1 * ?v2 AS ?dot) 

}
GROUP BY ?embed1 ?embed2
ORDER BY DESC(?similarity)

它需要

Jena 的 ARQ 库
中的 afn:sqrt 函数,因为标准 SPARQL 1.1 不提供
sqrt
函数。

它似乎有效,但在大数据上可能表现不佳:

----------------------------------------------------------------------------------------------------
| embed1                              | embed2                              | similarity           |
====================================================================================================
| <http://example.org/united-kingdom> | <http://example.org/united-kingdom> | 1.0000000000000002e0 |
| <http://example.org/london>         | <http://example.org/london>         | 1.0e0                |
| <http://example.org/united-states>  | <http://example.org/united-states>  | 1.0e0                |
| <http://example.org/united-states>  | <http://example.org/united-kingdom> | 0.8804311835944831e0 |
| <http://example.org/united-kingdom> | <http://example.org/united-states>  | 0.8804311835944831e0 |
| <http://example.org/london>         | <http://example.org/united-kingdom> | 0.8510995877458968e0 |
| <http://example.org/united-kingdom> | <http://example.org/london>         | 0.8510995877458968e0 |
| <http://example.org/london>         | <http://example.org/united-states>  | 0.7855264600385297e0 |
| <http://example.org/united-states>  | <http://example.org/london>         | 0.7855264600385297e0 |
----------------------------------------------------------------------------------------------------
© www.soinside.com 2019 - 2024. All rights reserved.