是什么类似于在AppleScript中的regEx,如果没有,有什么替代方案?

问题描述 投票:26回答:7

我需要解析文件名的前10个字符,看看它们是否都是数字。显而易见的方法是fileName = ~m / ^ \ d {10} /但是我没有在applescript参考中看到任何regExy,所以,我很好奇我还有哪些其他选项可以做这个验证。

regex applescript
7个回答
24
投票

不要绝望,因为OSX你也可以通过“do shell script”访问sed和grep。所以:

set thecommandstring to "echo \"" & filename & "\"|sed \"s/[0-9]\\{10\\}/*good*(&)/\"" as string
set sedResult to do shell script thecommandstring
set isgood to sedResult starts with "*good*"

我的sed技能不是太热,所以可能有一种更优雅的方式,就是将* good *附加到匹配[0-9] {10}的任何名称,然后在结果的开头寻找* good *。但基本上,如果filename是“1234567890dfoo.mov”,这将运行命令:

echo "1234567890foo.mov"|sed "s/[0-9]\{10\}/*good*(&)/"

注意转载的引号\“并且在applescript中转义反斜杠\\。如果你要逃避shell中的东西,你必须逃避转义。所以要运行一个反斜杠的shell脚本,你必须为它逃脱它像\\那样的shell,然后像\\\\那样转换AppleScript中的每个反斜杠。这可能很难阅读。

所以你可以在命令行上做任何事情,你可以通过从AppleScript调用它来做(woohoo!)。 stdout上的任何结果都会返回到脚本中作为结果。


17
投票

有一种更简单的方法可以使用shell(适用于bash 3.2+)进行正则表达式匹配:

set isMatch to "0" = (do shell script ¬
  "[[ " & quoted form of fileName & " =~ ^[[:digit:]]{10} ]]; printf $?")

注意:

  • 使用现代bash测试表达式[[ ... ]]与正则表达式匹配运算符=~;不引用正确的操作数(或至少是特殊的正则表达式字符。)是必须在bash 3.2+上,除非你在前面加上shopt -s compat31;
  • do shell script语句执行测试并通过附加命令返回其exit命令(感谢@LauriRanta); "0"表示成功。
  • 请注意,=~运算符不支持快捷字符类(如\d)和断言(如\b)(从OS X 10.9.4开始为真 - 这不太可能很快改变)。
  • 对于case-INsensitive匹配,使用shopt -s nocasematch;前置命令字符串
  • 对于区域设置感知,请在命令字符串前加上export LANG='" & user locale of (system info) & ".UTF-8';
  • 如果正则表达式包含捕获组,则可以通过内置的${BASH_REMATCH[@]}数组变量访问捕获的字符串。
  • 在接受的答案中,你将不得不使用\-escape双引号和反斜杠。

这是使用egrep的替代方案:

set isMatch to "0" = (do shell script ¬
  "egrep -q '^\\d{10}' <<<" & quoted form of filename & "; printf $?")

虽然这可能表现更差,但它有两个优点:

  • 您可以使用\d等快捷方式字符和\b等断言
  • 通过使用egrep调用-i,您可以更轻松地使匹配大小写不敏感:
  • 但是,您不能通过捕获组访问子匹配;如果需要,使用[[ ... =~ ... ]]方法。

最后,这里是包含两种方法的实用程序函数(语法高亮显示已关闭,但它们确实有效):

# SYNOPIS
#   doesMatch(text, regexString) -> Boolean
# DESCRIPTION
#   Matches string s against regular expression (string) regex using bash's extended regular expression language *including* 
#   support for shortcut classes such as `\d`, and assertions such as `\b`, and *returns a Boolean* to indicate if
#   there is a match or not.
#    - AppleScript's case sensitivity setting is respected; i.e., matching is case-INsensitive by default, unless inside
#      a 'considering case' block.
#    - The current user's locale is respected.
# EXAMPLE
#    my doesMatch("127.0.0.1", "^(\\d{1,3}\\.){3}\\d{1,3}$") # -> true
on doesMatch(s, regex)
    local ignoreCase, extraGrepOption
    set ignoreCase to "a" is "A"
    if ignoreCase then
        set extraGrepOption to "i"
    else
        set extraGrepOption to ""
    end if
    # Note: So that classes such as \w work with different locales, we need to set the shell's locale explicitly to the current user's.
    #       Rather than let the shell command fail we return the exit code and test for "0" to avoid having to deal with exception handling in AppleScript.
    tell me to return "0" = (do shell script "export LANG='" & user locale of (system info) & ".UTF-8'; egrep -q" & extraGrepOption & " " & quoted form of regex & " <<< " & quoted form of s & "; printf $?")
end doesMatch

# SYNOPSIS
#   getMatch(text, regexString) -> { overallMatch[, captureGroup1Match ...] } or {}
# DESCRIPTION
#   Matches string s against regular expression (string) regex using bash's extended regular expression language and
#   *returns the matching string and substrings matching capture groups, if any.*
#   
#   - AppleScript's case sensitivity setting is respected; i.e., matching is case-INsensitive by default, unless this subroutine is called inside
#     a 'considering case' block.
#   - The current user's locale is respected.
#   
#   IMPORTANT: 
#   
#   Unlike doesMatch(), this subroutine does NOT support shortcut character classes such as \d.
#   Instead, use one of the following POSIX classes (see `man re_format`):
#       [[:alpha:]] [[:word:]] [[:lower:]] [[:upper:]] [[:ascii:]]
#       [[:alnum:]] [[:digit:]] [[:xdigit:]]
#       [[:blank:]] [[:space:]] [[:punct:]] [[:cntrl:]] 
#       [[:graph:]]  [[:print:]] 
#   
#   Also, `\b`, '\B', '\<', and '\>' are not supported; you can use `[[:<:]]` for '\<' and `[[:>:]]` for `\>`
#   
#   Always returns a *list*:
#    - an empty list, if no match is found
#    - otherwise, the first list element contains the matching string
#       - if regex contains capture groups, additional elements return the strings captured by the capture groups; note that *named* capture groups are NOT supported.
#  EXAMPLE
#       my getMatch("127.0.0.1", "^([[:digit:]]{1,3})\\.([[:digit:]]{1,3})\\.([[:digit:]]{1,3})\\.([[:digit:]]{1,3})$") # -> { "127.0.0.1", "127", "0", "0", "1" }
on getMatch(s, regex)
    local ignoreCase, extraCommand
    set ignoreCase to "a" is "A"
    if ignoreCase then
        set extraCommand to "shopt -s nocasematch; "
    else
        set extraCommand to ""
    end if
    # Note: 
    #  So that classes such as [[:alpha:]] work with different locales, we need to set the shell's locale explicitly to the current user's.
    #  Since `quoted form of` encloses its argument in single quotes, we must set compatibility option `shopt -s compat31` for the =~ operator to work.
    #  Rather than let the shell command fail we return '' in case of non-match to avoid having to deal with exception handling in AppleScript.
    tell me to do shell script "export LANG='" & user locale of (system info) & ".UTF-8'; shopt -s compat31; " & extraCommand & "[[ " & quoted form of s & " =~ " & quoted form of regex & " ]] && printf '%s\\n' \"${BASH_REMATCH[@]}\" || printf ''"
    return paragraphs of result
end getMatch

11
投票

我最近需要在脚本中使用正则表达式,并希望找到一个脚本添加来处理它,因此更容易阅读正在发生的事情。我找到了Satimage.osax,它允许你使用如下语法:

find text "n(.*)" in "to be or not to be" with regexp

唯一的缺点是(截至2010年8月11日)它是一个32位的添加,因此当它从64位进程调用时会抛出错误。这让我陷入了Mail rule for Snow Leopard,因为我必须以32位模式运行Mail。但是,从一个独立的脚本调用,我没有任何保留 - 它真的很棒,让你选择你想要的任何regex syntax,并使用back-references

2011年5月28日更新

感谢Mitchell Model在下面的评论指出他们已将其更新为64位,因此不再需要预订 - 它可以完成我需要的一切。


3
投票

我确信有一个Applescript Addition或一个shell脚本可以被调用来将正则表达式放到折叠中,但我避免了对简单内容的依赖。我一直使用这种风格模式......

set filename to "1234567890abcdefghijkl"

return isPrefixGood(filename)

on isPrefixGood(filename) --returns boolean
    set legalCharacters to {"1", "2", "3", "4", "5", "6", "7", "8", "9", "0"}

    set thePrefix to (characters 1 thru 10) of filename as text

    set badPrefix to false

    repeat with thisChr from 1 to (get count of characters in thePrefix)
        set theChr to character thisChr of thePrefix
        if theChr is not in legalCharacters then
            set badPrefix to true
        end if
    end repeat

    if badPrefix is true then
        return "bad prefix"
    end if

    return "good prefix"
end isPrefixGood

3
投票

这是检查任何字符串的前十个字符是否为数字的另一种方法。

    on checkFilename(thisName)
        set {n, isOk} to {length of fileName, true}
        try
            repeat with i from 1 to 10
                set isOk to (isOk and ((character i of thisName) is in "0123456789"))
            end repeat
            return isOk
        on error
            return false
        end try
    end checkFilename

1
投票

我有一个替代方案,直到我为Thompson NFA算法实现了字符类,我已经在AppleScript中完成了工作。如果有人有兴趣寻找使用Applescript解析非常基本的正则表达式,那么代码将在MacScripters的CodeExchange中发布,请看看!

这是解决文本/字符串的十个第一个字符的解决方案:

 set mstr to "1234567889Abcdefg"
set isnum to prefixIsOnlyDigits for mstr
to prefixIsOnlyDigits for aText
    set aProbe to text 1 thru 10 of aText
    set isnum to false
    if not ((offset of "," in aProbe) > 0 or (offset of "." in aProbe) > 0 or (offset of "-" in aProbe) > 0) then
        try
            set aNumber to aProbe as number
            set isnum to true
        end try
    end if
    return isnum
end prefixIsOnlyDigits

1
投票

我能够直接从AppleScript(在High Sierra上)调用JavaScript,具体如下。

# Returns a list of strings from _subject that match _regex
# _regex in the format of /<value>/<flags>
on match(_subject, _regex)
    set _js to "(new String(`" & _subject & "`)).match(" & _regex & ")"
    set _result to run script _js in "JavaScript"
    if _result is null or _result is missing value then
        return {}
    end if
    return _result
end match

match("file-name.applescript", "/^\\d+/g") #=> {}
match("1234_file.js", "/^\\d+/g") #=> {"1234"}
match("5-for-fighting.mp4", "/^\\d+/g") #=> {"5"}

似乎大多数JavaScript String methods按预期工作。我没有找到适用于macOS Automation的JavaScript兼容的ECMAScript版本的参考,因此请在使用前进行测试。

© www.soinside.com 2019 - 2024. All rights reserved.