从算法上讲,我想找到最雄辩和最有效的方法来计算 SWI-Prolog 中某些模式的出现次数。
现在,我的解决方案使用 DCG 看起来像这样:
count_occurrences(Pattern, List, Count) :-
phrase(Pattern, PatternList),
length(PatternList, PatternLength),
length(List, ListLength),
MaxIndex is ListLength - PatternLength,
findall(1, (
between(0, MaxIndex, Index),
sublist(List, Index, PatternLength, PatternList)
), Counts),
sum_list(Counts, Count).
sublist(List, Index, Length, Sublist) :-
length(Sublist, Length),
append(Prefix, Rest, List),
length(Prefix, Index),
append(Sublist, _, Rest).
rep(S, L) --> [], {L is 0} | [S], {L is 1} | [S], { L_1 is L - 1 }, rep(S, L_1), !.
open_rep(S, L) --> [v], rep(S, L), [v], !.
结果是这样的:
1 ?- count_occurrences(open_rep(n, 1), [v,n,v,n,v,v,v,v,v,v,n,v,v], Count).
Count = 3.
2 ?- count_occurrences(open_rep(n, 3), [v,n,v,n,v,v,v,v,v,v,n,n,n,v,v], Count).
Count = 1.
3 ?- count_occurrences(open_rep(n, 1), [v,n,v,n,v,v,v,v,v,v,n,n,n,v,v], Count).
Count = 2.
我对这个结果很满意,但我认为这不是实现它的最有效方法,并且希望得到一些帮助以加快计算速度。我认为使用
append
和 findall
在计算上很昂贵,但不知道没有它们如何做......
这里显示的 DCG 只是一个例子,但我必须计算列表中模式的出现次数并给它们打分。这是在使用 Alpha-Beta Pruning 为 Gomoku 实现 AI 的上下文中。由于董事会经常被评估,算法的复杂性很重要,以减少人工智能采取行动所需的时间。
我尝试了上面显示的代码的多种变体,但它们都使用
findall
谓词,我发现减少计算时间的最佳解决方案是实施早期失败。
我认为,即使使用谓词
append/3
,你仍然可以有一个有效的解决方案(至少这是可以用 swi-prolog 观察到的)。
count(Pattern, List, Count) :-
phrase(Pattern, Sublist),
count(List, Sublist, 0, Count).
count([], _, Count, Count).
count([X|Xs], Sublist, Count0, Count) :-
( append(Sublist, _, [X|Xs])
-> Count1 is Count0 + 1
; Count1 is Count0 ),
count(Xs, Sublist, Count1, Count).
% To compare the two versions with large lists
random_list(0, []) :- !.
random_list(N, [X|Xs]) :-
random_member(X, [n,v]),
M is N-1,
random_list(M, Xs).
比较
count/3
和count_occurrences/3
,使用swi-prolog:
?- random_list(1000,L), time(count(open_rep(n,1),L,C1)), time(count_occurrences(open_rep(n,1),L,C2)).
% 4,648 inferences, 0.000 CPU in 0.000 seconds (0% CPU, Infinite Lips)
% 3,003,226 inferences, 1.578 CPU in 1.683 seconds (94% CPU, 1903034 Lips)
L = [v, n, v, v, n, n, n, n, v|...],
C1 = C2, C2 = 116.
?- random_list(2000,L), time(count(open_rep(n,1),L,C1)), time(count_occurrences(open_rep(n,1),L,C2)).
% 9,288 inferences, 0.000 CPU in 0.000 seconds (0% CPU, Infinite Lips)
% 12,006,431 inferences, 11.812 CPU in 12.421 seconds (95% CPU, 1016417 Lips)
L = [n, n, v, v, n, n, n, v, n|...],
C1 = C2, C2 = 229.
?- random_list(1000000,L), time(count(open_rep(n,3),L,C1)).
% 4,905,979 inferences, 0.109 CPU in 0.146 seconds (75% CPU, 44854665 Lips)
L = [n, v, v, v, n, v, n, n, n|...],
C1 = 31246.
?- L = [v,n,v,n,v,v,v,v,v,v,n,v,v], time(count(open_rep(n,1),L,C1)), time(count_occurrences(open_rep(n,1),L,C2)).
% 77 inferences, 0.000 CPU in 0.000 seconds (0% CPU, Infinite Lips)
% 571 inferences, 0.000 CPU in 0.000 seconds (0% CPU, Infinite Lips)
L = [v, n, v, n, v, v, v, v, v|...],
C1 = C2, C2 = 3.
?- L = [v,n,v,n,v,v,v,v,v,v,n,n,n,v,v], time(count(open_rep(n,3),L,C1)), time(count_occurrences(open_rep(n,3),L,C2)).
% 99 inferences, 0.000 CPU in 0.000 seconds (0% CPU, Infinite Lips)
% 641 inferences, 0.000 CPU in 0.000 seconds (0% CPU, Infinite Lips)
L = [v, n, v, n, v, v, v, v, v|...],
C1 = C2, C2 = 1.
?- L = [v,n,v,n,v,v,v,v,v,v,n,n,n,v,v], time(count(open_rep(n,1),L,C1)), time(count_occurrences(open_rep(n,1),L,C2)).
% 86 inferences, 0.000 CPU in 0.000 seconds (0% CPU, Infinite Lips)
% 739 inferences, 0.000 CPU in 0.000 seconds (0% CPU, Infinite Lips)
L = [v, n, v, n, v, v, v, v, v|...],
C1 = C2, C2 = 2.
IMO 你的方法太具体了,从(重)可用性的角度来看,这将是次优的。 SWI-Prolog 提供了一个库谓词,它执行 RLE(运行长度编码),正如我从这个有趣的话题 中发现的那样,值得一试:在这里我将发布一个模块,我在其中复制了您的代码,以及一个使用 clumped/2 的替代方案:
/* File: x_so_sliding_window.pl
Author: Carlo
Created: Mar 4 2023
Purpose: https://stackoverflow.com/q/75630809/874024
*/
:- module(x_so_sliding_window,
[count_occurrences/3
,open_rep//2
,count_by_clumped/3
]).
count_occurrences(Pattern, List, Count) :-
phrase(Pattern, PatternList),
length(PatternList, PatternLength),
length(List, ListLength),
MaxIndex is ListLength - PatternLength,
findall(1, (
between(0, MaxIndex, Index),
sublist(List, Index, PatternLength, PatternList)
), Counts),
sum_list(Counts, Count).
sublist(List, Index, Length, Sublist) :-
length(Sublist, Length),
append(Prefix, Rest, List),
length(Prefix, Index),
append(Sublist, _, Rest).
rep(S, L) --> [], {L is 0} | [S], {L is 1} | [S], { L_1 is L - 1 }, rep(S, L_1), !.
open_rep(S, L) --> [v], rep(S, L), [v], !.
count_by_clumped(Pattern,List,Count) :-
clumped(List, R),
aggregate_all(count, member(Pattern,R), Count).
然后我有这个代码
t(Count) :- count_occurrences(open_rep(n, 1), [v,n,v,n,v,v,v,v,v,v,n,v,v], Count).
t(Count) :- count_occurrences(open_rep(n, 3), [v,n,v,n,v,v,v,v,v,v,n,n,n,v,v], Count).
t(Count) :- count_occurrences(open_rep(n, 1), [v,n,v,n,v,v,v,v,v,v,n,n,n,v,v], Count).
c(Count) :- count_by_clumped(n-1, [v,n,v,n,v,v,v,v,v,v,n,v,v], Count).
c(Count) :- count_by_clumped(n-3, [v,n,v,n,v,v,v,v,v,v,n,n,n,v,v], Count).
c(Count) :- count_by_clumped(n-1, [v,n,v,n,v,v,v,v,v,v,n,n,n,v,v], Count).
和比较“司机”:
?- time((between(1,1000,_),(t(_),fail))).
% 1,953,001 inferences, 0.234 CPU in 0.234 seconds (100% CPU, 8332804 Lips)
false.
?- time((between(1,1000,_),(c(_),fail))).
% 123,001 inferences, 0.000 CPU in 0.014 seconds (0% CPU, Infinite Lips)
false.
这是一个更好的 sublist 方法:
sublist1([H|T], Index, Length, Sublist) :-
sublist1_start_(T, H, 1, Index, Length, Sublist).
% P = previous
sublist1_start_(L, P, Ind, Index, Length, Sublist) :-
sublist1_loop_(L, P, Ind, Index, 1, Length, Sublist).
sublist1_start_([H|T], _, Ind, Index, Length, Sublist) :-
Ind1 is Ind + 1,
% Go to next element
sublist1_start_(T, H, Ind1, Index, Length, Sublist).
sublist1_loop_(_, H, Ind, Ind, Len, Len, [H]).
sublist1_loop_([H|T], P, Ind, Index, Len, Length, [P|Sublist]) :-
Len1 is Len + 1,
sublist1_loop_(T, H, Ind, Index, Len1, Length, Sublist).
swi-prolog 中的结果:
?- sublist1([a,b,c], Ind, Len, Sub).
Ind = Len, Len = 1,
Sub = [a] ;
Ind = 1,
Len = 2,
Sub = [a, b] ;
Ind = 1,
Len = 3,
Sub = [a, b, c] ;
Ind = 2,
Len = 1,
Sub = [b] ;
Ind = Len, Len = 2,
Sub = [b, c] ;
Ind = 3,
Len = 1,
Sub = [c].
如果您希望索引从 0 而不是 1 开始,则将第 2 行中的 1 更改为 0。
我排除了空列表。
您的问题中缺少某些内容,无法说出是什么。我也对答案感到惊讶。
不确定是否必须使用列表或者是否可以使用原子或字符串。原子和字符串分别有 sub_atom/5 和 sub_string/5,可用于搜索。像这样:
?- sub_atom(vnvnvvvvvvnvv, Before, Len, After, vnv).
Before = 0, Len = 3, After = 10 ;
Before = 2, Len = 3, After = 8 ;
Before = 9, Len = 3, After = 1 ;
false.
相同但字符串适用于 sub_string/5.
如果您已经在使用 SWI-Prolog,您可以使用库(聚合)来计算解决方案:
?- aggregate_all(count, sub_atom(vnvnvvvvvvnvv, _,_,_, vnv), N).
N = 3.
?- aggregate_all(count, sub_atom(vnvnvvvvvvnnnvv, _,_,_, vnv), N).
N = 2.
到目前为止,您不需要编写任何程序。如果你必须使用列表,两个附加就足够了:
?- aggregate_all(
count,
( append(_, Back, [v,n,v,n,v,v,v,v,v,v,n,n,n,v,v]),
append([v,n,n,n,v], _, Back)
),
N).
N = 1.
请记住,您还有 正则表达式库 library(pcre)。如果您使用文档中的示例代码:
:- use_module(library(pcre)).
re_match_count(Regex, String, N) :-
re_foldl(inc, Regex, String, 0, N, []).
inc(_, V0, V) :- V is V0 + 1.
您还可以使用原子或字符串进行计数:
?- re_match_count('vn{1}v', vnvnvvvvvvnvv, N).
N = 2.
?- re_match_count("vn{3}v", "vnvnvvvvvvnnnvv", N).
N = 1.
在这个例子中,所有关于匹配本身的信息都被丢弃,只计算匹配的数量。使用 re_foldl/6 的第一个参数,你可以做任何你想做的事。
注意:您需要衡量什么对您最有效。如果有一种方法可以读取字符串并将其保存为 Prolog 字符串,并使用库(pcre),那可能比其他方法更快。