如何调试传递给grep的(PCRE)正则表达式?

问题描述 投票:0回答:2

我正在尝试调试传递给grep的正则表达式,该正则表达式似乎不能仅在我的系统上运行。

这是完整的命令,应该返回最新的terraform发行版本:

wget -qO - "https://api.github.com/repos/hashicorp/terraform/releases/latest" | grep -Po '"tag_name": "v\K.*?(?=")'

哪个似乎为别人工作,但不是我。在*后面添加"tag_name":量词以匹配多余的空格使它对我有用:

wget -qO - "https://api.github.com/repos/hashicorp/terraform/releases/latest" | grep -Po '"tag_name": *"v\K.*?(?=")'

这是从wget到管道grep的响应:

{
  "url": "https://api.github.com/repos/hashicorp/terraform/releases/20814583",
  "assets_url": "https://api.github.com/repos/hashicorp/terraform/releases/20814583/assets",
  "upload_url": "https://uploads.github.com/repos/hashicorp/terraform/releases/20814583/assets{?name,label}",
  "html_url": "https://github.com/hashicorp/terraform/releases/tag/v0.12.12",
  "id": 20814583,
  "node_id": "MDc6UmVsZWFzZTIwODE0NTgz",
  "tag_name": "v0.12.12",
  "target_commitish": "master",
  "name": "",
  "draft": false,
  "author": {
    "login": "apparentlymart",
    "id": 20180,
    "node_id": "MDQ6VXNlcjIwMTgw",
    "avatar_url": "https://avatars1.githubusercontent.com/u/20180?v=4",
    "gravatar_id": "",
    "url": "https://api.github.com/users/apparentlymart",
    "html_url": "https://github.com/apparentlymart",
    "followers_url": "https://api.github.com/users/apparentlymart/followers",
    "following_url": "https://api.github.com/users/apparentlymart/following{/other_user}",
    "gists_url": "https://api.github.com/users/apparentlymart/gists{/gist_id}",
    "starred_url": "https://api.github.com/users/apparentlymart/starred{/owner}{/repo}",
    "subscriptions_url": "https://api.github.com/users/apparentlymart/subscriptions",
    "organizations_url": "https://api.github.com/users/apparentlymart/orgs",
    "repos_url": "https://api.github.com/users/apparentlymart/repos",
    "events_url": "https://api.github.com/users/apparentlymart/events{/privacy}",
    "received_events_url": "https://api.github.com/users/apparentlymart/received_events",
    "type": "User",
    "site_admin": false
  },
  "prerelease": false,
  "created_at": "2019-10-18T18:39:16Z",
  "published_at": "2019-10-18T18:45:33Z",
  "assets": [],
  "tarball_url": "https://api.github.com/repos/hashicorp/terraform/tarball/v0.12.12",
  "zipball_url": "https://api.github.com/repos/hashicorp/terraform/zipball/v0.12.12",
  "body": "BUG FIXES:\r\n\r\n* backend/remote: Don't do local validation of whether variables are set prior to submitting, because only the remote system knows the full set of configured stored variables and environment variables that might contribute. This avoids erroneous error messages about unset required variables for remote runs when those variables will be set by stored variables in the remote workspace. ([#23122](https://github.com/hashicorp/terraform/issues/23122))"
}

并且使用https://regex101.com,我可以看到"tag_name": "v\K.*?(?=")"tag_name": *"v\K.*?(?=")都与版本号正确匹配。

所以我的系统肯定出了点问题,我很好奇为什么原始的系统对我不起作用,以及如何(如果可能)在这种情况下进行调试。

regex bash shell grep pcre
2个回答
0
投票

您的RegExp引擎很可能不理解\ K。正则表达式有许多方言。

使用标准PCRE正则表达式术语通常会在所有引擎上产生良好的结果。

$ curl -s "https://api.github.com/repos/hashicorp/terraform/releases/latest" | egrep -oe '"tag_name": "v(.*)"'
"tag_name": "v0.12.12"

现在,如果您只想要版本号,则需要在以后获取编号(因为使用?!来忽略模式可能并不总是有效。

curl -s "https://api.github.com/repos/hashicorp/terraform/releases/latest" | egrep -oe '"tag_name": "v(.*)"' | egrep -oe '([0-9]+\.?)+'
0.12.12

0
投票

我已经将其缩小到以下范围。如果我在不使用管道grep且不格式化json响应的情况下执行wget命令:

wget -qO - "https://api.github.com/repos/hashicorp/terraform/releases/latest"

然后我得到一个没有任何空格的json(我将仅在响应的一部分中发布一个):

"html_url":"https://github.com/hashicorp/terraform/releases/tag/v0.12.12","id":20814583,"node_id":"MDc6UmVsZWFzZTIwODE0NTgz","tag_name":"v0.12.12","target_commitish":"master","name":"","draft":false

很自然,原始正则表达式"tag_name": "v\K.*?(?=")失败,因为:后没有空格

这显然与传递给grep的正则表达式或grep本身无关。我在这里没有深入研究响应本身的意义,因此可以认为原始问题已经解决(尽管如果有人知道是什么原因引起的,请在评论中回答。)

© www.soinside.com 2019 - 2024. All rights reserved.