SWI-Prolog中模式识别的滑动窗口方法

问题描述 投票:0回答:4

从算法上讲,我想找到最雄辩和最有效的方法来计算 SWI-Prolog 中某些模式的出现次数。

现在,我的解决方案使用 DCG 看起来像这样:

count_occurrences(Pattern, List, Count) :-
    phrase(Pattern, PatternList),
    length(PatternList, PatternLength),
    length(List, ListLength),
    MaxIndex is ListLength - PatternLength,
    findall(1, (
        between(0, MaxIndex, Index),
        sublist(List, Index, PatternLength, PatternList)
    ), Counts),
    sum_list(Counts, Count).

sublist(List, Index, Length, Sublist) :-
    length(Sublist, Length),
    append(Prefix, Rest, List),
    length(Prefix, Index),
    append(Sublist, _, Rest).

rep(S, L) --> [], {L is 0} | [S], {L is 1} | [S], { L_1 is L - 1 }, rep(S, L_1), !.
open_rep(S, L) --> [v], rep(S, L), [v], !.

结果是这样的:

1 ?- count_occurrences(open_rep(n, 1), [v,n,v,n,v,v,v,v,v,v,n,v,v], Count).
Count = 3.

2 ?- count_occurrences(open_rep(n, 3), [v,n,v,n,v,v,v,v,v,v,n,n,n,v,v], Count).
Count = 1.

3 ?- count_occurrences(open_rep(n, 1), [v,n,v,n,v,v,v,v,v,v,n,n,n,v,v], Count).
Count = 2.

我对这个结果很满意,但我认为这不是实现它的最有效方法,并且希望得到一些帮助以加快计算速度。我认为使用

append
findall
在计算上很昂贵,但不知道没有它们如何做......

这里显示的 DCG 只是一个例子,但我必须计算列表中模式的出现次数并给它们打分。这是在使用 Alpha-Beta PruningGomoku 实现 AI 的上下文中。由于董事会经常被评估,算法的复杂性很重要,以减少人工智能采取行动所需的时间。

我尝试了上面显示的代码的多种变体,但它们都使用

findall
谓词,我发现减少计算时间的最佳解决方案是实施早期失败。

prolog pattern-matching sliding-window dcg gomoku
4个回答
1
投票

我认为,即使使用谓词

append/3
,你仍然可以有一个有效的解决方案(至少这是可以用 swi-prolog 观察到的)。

count(Pattern, List, Count) :-
    phrase(Pattern, Sublist),
    count(List, Sublist, 0, Count).

count([], _, Count, Count).
count([X|Xs], Sublist, Count0, Count) :-
    (   append(Sublist, _, [X|Xs])
    ->   Count1 is Count0 + 1
    ;    Count1 is Count0 ),
    count(Xs, Sublist, Count1, Count).

% To compare the two versions with large lists

random_list(0, []) :- !.
random_list(N, [X|Xs]) :- 
    random_member(X, [n,v]), 
    M is N-1, 
    random_list(M, Xs).

比较

count/3
count_occurrences/3
,使用swi-prolog:

?- random_list(1000,L), time(count(open_rep(n,1),L,C1)), time(count_occurrences(open_rep(n,1),L,C2)).
% 4,648 inferences, 0.000 CPU in 0.000 seconds (0% CPU, Infinite Lips)
% 3,003,226 inferences, 1.578 CPU in 1.683 seconds (94% CPU, 1903034 Lips)
L = [v, n, v, v, n, n, n, n, v|...],
C1 = C2, C2 = 116.

?- random_list(2000,L), time(count(open_rep(n,1),L,C1)), time(count_occurrences(open_rep(n,1),L,C2)).
% 9,288 inferences, 0.000 CPU in 0.000 seconds (0% CPU, Infinite Lips)
% 12,006,431 inferences, 11.812 CPU in 12.421 seconds (95% CPU, 1016417 Lips)
L = [n, n, v, v, n, n, n, v, n|...],
C1 = C2, C2 = 229.

?- random_list(1000000,L), time(count(open_rep(n,3),L,C1)).
% 4,905,979 inferences, 0.109 CPU in 0.146 seconds (75% CPU, 44854665 Lips)
L = [n, v, v, v, n, v, n, n, n|...],
C1 = 31246.

?- L = [v,n,v,n,v,v,v,v,v,v,n,v,v], time(count(open_rep(n,1),L,C1)), time(count_occurrences(open_rep(n,1),L,C2)).
% 77 inferences, 0.000 CPU in 0.000 seconds (0% CPU, Infinite Lips)
% 571 inferences, 0.000 CPU in 0.000 seconds (0% CPU, Infinite Lips)
L = [v, n, v, n, v, v, v, v, v|...],
C1 = C2, C2 = 3.

?- L = [v,n,v,n,v,v,v,v,v,v,n,n,n,v,v], time(count(open_rep(n,3),L,C1)), time(count_occurrences(open_rep(n,3),L,C2)).
% 99 inferences, 0.000 CPU in 0.000 seconds (0% CPU, Infinite Lips)
% 641 inferences, 0.000 CPU in 0.000 seconds (0% CPU, Infinite Lips)
L = [v, n, v, n, v, v, v, v, v|...],
C1 = C2, C2 = 1.

?- L = [v,n,v,n,v,v,v,v,v,v,n,n,n,v,v], time(count(open_rep(n,1),L,C1)), time(count_occurrences(open_rep(n,1),L,C2)).
% 86 inferences, 0.000 CPU in 0.000 seconds (0% CPU, Infinite Lips)
% 739 inferences, 0.000 CPU in 0.000 seconds (0% CPU, Infinite Lips)
L = [v, n, v, n, v, v, v, v, v|...],
C1 = C2, C2 = 2.

1
投票

IMO 你的方法太具体了,从(重)可用性的角度来看,这将是次优的。 SWI-Prolog 提供了一个库谓词,它执行 RLE(运行长度编码),正如我从这个有趣的话题 中发现的那样,值得一试:在这里我将发布一个模块,我在其中复制了您的代码,以及一个使用 clumped/2 的替代方案:

/*  File:    x_so_sliding_window.pl
    Author:  Carlo
    Created: Mar  4 2023
    Purpose: https://stackoverflow.com/q/75630809/874024
*/

:- module(x_so_sliding_window,
          [count_occurrences/3
          ,open_rep//2
          ,count_by_clumped/3
          ]).


count_occurrences(Pattern, List, Count) :-
    phrase(Pattern, PatternList),
    length(PatternList, PatternLength),
    length(List, ListLength),
    MaxIndex is ListLength - PatternLength,
    findall(1, (
        between(0, MaxIndex, Index),
        sublist(List, Index, PatternLength, PatternList)
    ), Counts),
    sum_list(Counts, Count).

sublist(List, Index, Length, Sublist) :-
    length(Sublist, Length),
    append(Prefix, Rest, List),
    length(Prefix, Index),
    append(Sublist, _, Rest).

rep(S, L) --> [], {L is 0} | [S], {L is 1} | [S], { L_1 is L - 1 }, rep(S, L_1), !.
open_rep(S, L) --> [v], rep(S, L), [v], !.


count_by_clumped(Pattern,List,Count) :-
    clumped(List, R),
    aggregate_all(count, member(Pattern,R), Count).

然后我有这个代码

t(Count) :- count_occurrences(open_rep(n, 1), [v,n,v,n,v,v,v,v,v,v,n,v,v], Count).
t(Count) :- count_occurrences(open_rep(n, 3), [v,n,v,n,v,v,v,v,v,v,n,n,n,v,v], Count).
t(Count) :- count_occurrences(open_rep(n, 1), [v,n,v,n,v,v,v,v,v,v,n,n,n,v,v], Count).

c(Count) :- count_by_clumped(n-1, [v,n,v,n,v,v,v,v,v,v,n,v,v], Count).
c(Count) :- count_by_clumped(n-3, [v,n,v,n,v,v,v,v,v,v,n,n,n,v,v], Count).
c(Count) :- count_by_clumped(n-1, [v,n,v,n,v,v,v,v,v,v,n,n,n,v,v], Count).

和比较“司机”:

?- time((between(1,1000,_),(t(_),fail))).
% 1,953,001 inferences, 0.234 CPU in 0.234 seconds (100% CPU, 8332804 Lips)
false.
?- time((between(1,1000,_),(c(_),fail))).
% 123,001 inferences, 0.000 CPU in 0.014 seconds (0% CPU, Infinite Lips)
false.

1
投票

这是一个更好的 sublist 方法:

sublist1([H|T], Index, Length, Sublist) :-
    sublist1_start_(T, H, 1, Index, Length, Sublist).

% P = previous
sublist1_start_(L, P, Ind, Index, Length, Sublist) :-
    sublist1_loop_(L, P, Ind, Index, 1, Length, Sublist).
sublist1_start_([H|T], _, Ind, Index, Length, Sublist) :-
    Ind1 is Ind + 1,
    % Go to next element
    sublist1_start_(T, H, Ind1, Index, Length, Sublist).

sublist1_loop_(_, H, Ind, Ind, Len, Len, [H]).
sublist1_loop_([H|T], P, Ind, Index, Len, Length, [P|Sublist]) :-
    Len1 is Len + 1,
    sublist1_loop_(T, H, Ind, Index, Len1, Length, Sublist).

swi-prolog 中的结果:

?- sublist1([a,b,c], Ind, Len, Sub).
Ind = Len, Len = 1,
Sub = [a] ;
Ind = 1,
Len = 2,
Sub = [a, b] ;
Ind = 1,
Len = 3,
Sub = [a, b, c] ;
Ind = 2,
Len = 1,
Sub = [b] ;
Ind = Len, Len = 2,
Sub = [b, c] ;
Ind = 3,
Len = 1,
Sub = [c].

如果您希望索引从 0 而不是 1 开始,则将第 2 行中的 1 更改为 0。

我排除了空列表。


0
投票

您的问题中缺少某些内容,无法说出是什么。我也对答案感到惊讶。

不确定是否必须使用列表或者是否可以使用原子或字符串。原子和字符串分别有 sub_atom/5 和 sub_string/5,可用于搜索。像这样:

?- sub_atom(vnvnvvvvvvnvv, Before, Len, After, vnv).
Before = 0, Len = 3, After = 10 ;
Before = 2, Len = 3, After = 8 ;
Before = 9, Len = 3, After = 1 ;
false.

相同但字符串适用于 sub_string/5.

如果您已经在使用 SWI-Prolog,您可以使用库(聚合)来计算解决方案:

?- aggregate_all(count, sub_atom(vnvnvvvvvvnvv, _,_,_, vnv), N).
N = 3.

?- aggregate_all(count, sub_atom(vnvnvvvvvvnnnvv, _,_,_, vnv), N).
N = 2.

到目前为止,您不需要编写任何程序。如果你必须使用列表,两个附加就足够了:

?- aggregate_all(
      count,
      ( append(_, Back, [v,n,v,n,v,v,v,v,v,v,n,n,n,v,v]),
        append([v,n,n,n,v], _, Back)
      ),
      N).
N = 1.

请记住,您还有 正则表达式库 library(pcre)。如果您使用文档中的示例代码

:- use_module(library(pcre)).

re_match_count(Regex, String, N) :-
    re_foldl(inc, Regex, String, 0, N, []).

inc(_, V0, V) :- V is V0 + 1.

您还可以使用原子或字符串进行计数:

?- re_match_count('vn{1}v', vnvnvvvvvvnvv, N).
N = 2.

?- re_match_count("vn{3}v", "vnvnvvvvvvnnnvv", N).
N = 1.

在这个例子中,所有关于匹配本身的信息都被丢弃,只计算匹配的数量。使用 re_foldl/6 的第一个参数,你可以做任何你想做的事。

注意:您需要衡量什么对您最有效。如果有一种方法可以读取字符串并将其保存为 Prolog 字符串,并使用库(pcre),那可能比其他方法更快。

© www.soinside.com 2019 - 2024. All rights reserved.