我需要解析文件名的前10个字符,看看它们是否都是数字。显而易见的方法是fileName = ~m / ^ \ d {10} /但是我没有在applescript参考中看到任何regExy,所以,我很好奇我还有哪些其他选项可以做这个验证。
不要绝望,因为OSX你也可以通过“do shell script”访问sed和grep。所以:
set thecommandstring to "echo \"" & filename & "\"|sed \"s/[0-9]\\{10\\}/*good*(&)/\"" as string
set sedResult to do shell script thecommandstring
set isgood to sedResult starts with "*good*"
我的sed技能不是太热,所以可能有一种更优雅的方式,就是将* good *附加到匹配[0-9] {10}的任何名称,然后在结果的开头寻找* good *。但基本上,如果filename是“1234567890dfoo.mov”,这将运行命令:
echo "1234567890foo.mov"|sed "s/[0-9]\{10\}/*good*(&)/"
注意转载的引号\“并且在applescript中转义反斜杠\\。如果你要逃避shell中的东西,你必须逃避转义。所以要运行一个反斜杠的shell脚本,你必须为它逃脱它像\\那样的shell,然后像\\\\那样转换AppleScript中的每个反斜杠。这可能很难阅读。
所以你可以在命令行上做任何事情,你可以通过从AppleScript调用它来做(woohoo!)。 stdout上的任何结果都会返回到脚本中作为结果。
有一种更简单的方法可以使用shell(适用于bash 3.2+)进行正则表达式匹配:
set isMatch to "0" = (do shell script ¬
"[[ " & quoted form of fileName & " =~ ^[[:digit:]]{10} ]]; printf $?")
注意:
[[ ... ]]
与正则表达式匹配运算符=~
;不引用正确的操作数(或至少是特殊的正则表达式字符。)是必须在bash 3.2+上,除非你在前面加上shopt -s compat31;
do shell script
语句执行测试并通过附加命令返回其exit命令(感谢@LauriRanta); "0"
表示成功。=~
运算符不支持快捷字符类(如\d
)和断言(如\b
)(从OS X 10.9.4开始为真 - 这不太可能很快改变)。shopt -s nocasematch;
前置命令字符串export LANG='" & user locale of (system info) & ".UTF-8';
。${BASH_REMATCH[@]}
数组变量访问捕获的字符串。\
-escape双引号和反斜杠。这是使用egrep
的替代方案:
set isMatch to "0" = (do shell script ¬
"egrep -q '^\\d{10}' <<<" & quoted form of filename & "; printf $?")
虽然这可能表现更差,但它有两个优点:
\d
等快捷方式字符和\b
等断言egrep
调用-i
,您可以更轻松地使匹配大小写不敏感:[[ ... =~ ... ]]
方法。最后,这里是包含两种方法的实用程序函数(语法高亮显示已关闭,但它们确实有效):
# SYNOPIS
# doesMatch(text, regexString) -> Boolean
# DESCRIPTION
# Matches string s against regular expression (string) regex using bash's extended regular expression language *including*
# support for shortcut classes such as `\d`, and assertions such as `\b`, and *returns a Boolean* to indicate if
# there is a match or not.
# - AppleScript's case sensitivity setting is respected; i.e., matching is case-INsensitive by default, unless inside
# a 'considering case' block.
# - The current user's locale is respected.
# EXAMPLE
# my doesMatch("127.0.0.1", "^(\\d{1,3}\\.){3}\\d{1,3}$") # -> true
on doesMatch(s, regex)
local ignoreCase, extraGrepOption
set ignoreCase to "a" is "A"
if ignoreCase then
set extraGrepOption to "i"
else
set extraGrepOption to ""
end if
# Note: So that classes such as \w work with different locales, we need to set the shell's locale explicitly to the current user's.
# Rather than let the shell command fail we return the exit code and test for "0" to avoid having to deal with exception handling in AppleScript.
tell me to return "0" = (do shell script "export LANG='" & user locale of (system info) & ".UTF-8'; egrep -q" & extraGrepOption & " " & quoted form of regex & " <<< " & quoted form of s & "; printf $?")
end doesMatch
# SYNOPSIS
# getMatch(text, regexString) -> { overallMatch[, captureGroup1Match ...] } or {}
# DESCRIPTION
# Matches string s against regular expression (string) regex using bash's extended regular expression language and
# *returns the matching string and substrings matching capture groups, if any.*
#
# - AppleScript's case sensitivity setting is respected; i.e., matching is case-INsensitive by default, unless this subroutine is called inside
# a 'considering case' block.
# - The current user's locale is respected.
#
# IMPORTANT:
#
# Unlike doesMatch(), this subroutine does NOT support shortcut character classes such as \d.
# Instead, use one of the following POSIX classes (see `man re_format`):
# [[:alpha:]] [[:word:]] [[:lower:]] [[:upper:]] [[:ascii:]]
# [[:alnum:]] [[:digit:]] [[:xdigit:]]
# [[:blank:]] [[:space:]] [[:punct:]] [[:cntrl:]]
# [[:graph:]] [[:print:]]
#
# Also, `\b`, '\B', '\<', and '\>' are not supported; you can use `[[:<:]]` for '\<' and `[[:>:]]` for `\>`
#
# Always returns a *list*:
# - an empty list, if no match is found
# - otherwise, the first list element contains the matching string
# - if regex contains capture groups, additional elements return the strings captured by the capture groups; note that *named* capture groups are NOT supported.
# EXAMPLE
# my getMatch("127.0.0.1", "^([[:digit:]]{1,3})\\.([[:digit:]]{1,3})\\.([[:digit:]]{1,3})\\.([[:digit:]]{1,3})$") # -> { "127.0.0.1", "127", "0", "0", "1" }
on getMatch(s, regex)
local ignoreCase, extraCommand
set ignoreCase to "a" is "A"
if ignoreCase then
set extraCommand to "shopt -s nocasematch; "
else
set extraCommand to ""
end if
# Note:
# So that classes such as [[:alpha:]] work with different locales, we need to set the shell's locale explicitly to the current user's.
# Since `quoted form of` encloses its argument in single quotes, we must set compatibility option `shopt -s compat31` for the =~ operator to work.
# Rather than let the shell command fail we return '' in case of non-match to avoid having to deal with exception handling in AppleScript.
tell me to do shell script "export LANG='" & user locale of (system info) & ".UTF-8'; shopt -s compat31; " & extraCommand & "[[ " & quoted form of s & " =~ " & quoted form of regex & " ]] && printf '%s\\n' \"${BASH_REMATCH[@]}\" || printf ''"
return paragraphs of result
end getMatch
我最近需要在脚本中使用正则表达式,并希望找到一个脚本添加来处理它,因此更容易阅读正在发生的事情。我找到了Satimage.osax,它允许你使用如下语法:
find text "n(.*)" in "to be or not to be" with regexp
唯一的缺点是(截至2010年8月11日)它是一个32位的添加,因此当它从64位进程调用时会抛出错误。这让我陷入了Mail rule for Snow Leopard,因为我必须以32位模式运行Mail。但是,从一个独立的脚本调用,我没有任何保留 - 它真的很棒,让你选择你想要的任何regex syntax,并使用back-references。
2011年5月28日更新
感谢Mitchell Model在下面的评论指出他们已将其更新为64位,因此不再需要预订 - 它可以完成我需要的一切。
我确信有一个Applescript Addition或一个shell脚本可以被调用来将正则表达式放到折叠中,但我避免了对简单内容的依赖。我一直使用这种风格模式......
set filename to "1234567890abcdefghijkl"
return isPrefixGood(filename)
on isPrefixGood(filename) --returns boolean
set legalCharacters to {"1", "2", "3", "4", "5", "6", "7", "8", "9", "0"}
set thePrefix to (characters 1 thru 10) of filename as text
set badPrefix to false
repeat with thisChr from 1 to (get count of characters in thePrefix)
set theChr to character thisChr of thePrefix
if theChr is not in legalCharacters then
set badPrefix to true
end if
end repeat
if badPrefix is true then
return "bad prefix"
end if
return "good prefix"
end isPrefixGood
这是检查任何字符串的前十个字符是否为数字的另一种方法。
on checkFilename(thisName)
set {n, isOk} to {length of fileName, true}
try
repeat with i from 1 to 10
set isOk to (isOk and ((character i of thisName) is in "0123456789"))
end repeat
return isOk
on error
return false
end try
end checkFilename
我有一个替代方案,直到我为Thompson NFA算法实现了字符类,我已经在AppleScript中完成了工作。如果有人有兴趣寻找使用Applescript解析非常基本的正则表达式,那么代码将在MacScripters的CodeExchange中发布,请看看!
这是解决文本/字符串的十个第一个字符的解决方案:
set mstr to "1234567889Abcdefg"
set isnum to prefixIsOnlyDigits for mstr
to prefixIsOnlyDigits for aText
set aProbe to text 1 thru 10 of aText
set isnum to false
if not ((offset of "," in aProbe) > 0 or (offset of "." in aProbe) > 0 or (offset of "-" in aProbe) > 0) then
try
set aNumber to aProbe as number
set isnum to true
end try
end if
return isnum
end prefixIsOnlyDigits
我能够直接从AppleScript(在High Sierra上)调用JavaScript,具体如下。
# Returns a list of strings from _subject that match _regex
# _regex in the format of /<value>/<flags>
on match(_subject, _regex)
set _js to "(new String(`" & _subject & "`)).match(" & _regex & ")"
set _result to run script _js in "JavaScript"
if _result is null or _result is missing value then
return {}
end if
return _result
end match
match("file-name.applescript", "/^\\d+/g") #=> {}
match("1234_file.js", "/^\\d+/g") #=> {"1234"}
match("5-for-fighting.mp4", "/^\\d+/g") #=> {"5"}
似乎大多数JavaScript String methods按预期工作。我没有找到适用于macOS Automation的JavaScript兼容的ECMAScript版本的参考,因此请在使用前进行测试。