用于过滤不必要子域的正则表达式[关闭]

问题描述 投票:0回答:1

我的代码使用正则表达式来过滤掉输出文本中的子域。

   String outputText = "{\"host\":\"out-16.smtp.github.com\",\"input\":\"github.com\",\"source\":\"hackertarget\"}\n" +
                "{\"host\":\"pkg.github.com\",\"input\":\"github.com\",\"source\":\"anubis\"}\n" +
                "{\"host\":\"lb-140-82-114-14-iad.github.com\",\"input\":\"github.com\",\"source\":\"hackertarget\"}\n" +
                "{\"host\":\"out-2.smtp.github.com\",\"input\":\"github.com\",\"source\":\"hackertarget\"}\n" +
                "{\"host\":\"stars.github.com\",\"input\":\"github.com\",\"source\":\"hackertarget\"}\n" +
                "{\"host\":\"ns2.github.com\",\"input\":\"github.com\",\"source\":\"dnsdumpster\"}\n" +
                "{\"host\":\"embed.github.com\",\"input\":\"github.com\",\"source\":\"alienvault\"}\n" +
                "{\"host\":\"collector-cdn.github.com\",\"input\":\"github.com\",\"source\":\"hackertarget\"}\n" +
                "{\"host\":\"lb-140-82-112-4-iad.github.com\",\"input\":\"github.com\",\"source\":\"hackertarget\"}\n" +
                "{\"host\":\"lb-140-82-114-12-iad.github.com\",\"input\":\"github.com\",\"source\":\"hackertarget\"}\n" +
                "{\"host\":\"lb-140-82-114-6-iad.github.com\",\"input\":\"github.com\",\"source\":\"hackertarget\"}\n" +
                "{\"host\":\"lb-140-82-114-9-iad.github.com\",\"input\":\"github.com\",\"source\":\"hackertarget\"}\n" +
                "{\"host\":\"m.communication.github.com\",\"input\":\"github.com\",\"source\":\"crtsh\"}\n" +
                "{\"host\":\"github.com\",\"input\":\"github.com\",\"source\":\"digitorus\"}\n" +
                "{\"host\":\"t.communication.github.com\",\"input\":\"github.com\",\"source\":\"crtsh\"}\n" +
                "{\"host\":\"slack.github.com\",\"input\":\"github.com\",\"source\":\"anubis\"}\n" +
                "{\"host\":\"out-3.smtp.github.com\",\"input\":\"github.com\",\"source\":\"hackertarget\"}\n" +
                "{\"host\":\"community.github.com\",\"input\":\"github.com\",\"source\":\"anubis\"}\n";


        List<String> response = RegexUtil.findMatchedRegexFromStrText(outputText, "\"([a-zA-Z0-9-]+\\.github\\.com)\"");

        System.out.println(response);

返回:


["lb-140-82-114-14-iad.github.com", "embed.github.com", "collector-cdn.github.com", "lb-140-82-114-6-iad.github.com", "community.github.com", "lb-140-82-114-12-iad.github.com", "slack.github.com", "pkg.github.com", "lb-140-82-112-4-iad.github.com", "lb-140-82-114-9-iad.github.com", "stars.github.com", "ns2.github.com"]

它应该过滤掉不好的名称子域,例如:lb-140-82-114-14-iad.github.com、lb-140-82-114-12-iad.github.com 或 out-2.smtp.github .com。它应该返回正确的子域,例如 slack.github.com、pkg.github.com。

我可以使用什么正则表达式或者我可以采用什么方法?

我同时尝试了不同的正则表达式和多个正则表达式,但大多数情况下它返回空白响应。我需要该正则表达式来过滤不良子域并仅返回正确命名的子域。

java regex subdomain
1个回答
0
投票

尝试这个正则表达式

/^(lb|out)-\d+-\d+-\d+-\d+-iad\.github\.com$/
© www.soinside.com 2019 - 2024. All rights reserved.