我的代码使用正则表达式来过滤掉输出文本中的子域。
String outputText = "{\"host\":\"out-16.smtp.github.com\",\"input\":\"github.com\",\"source\":\"hackertarget\"}\n" +
"{\"host\":\"pkg.github.com\",\"input\":\"github.com\",\"source\":\"anubis\"}\n" +
"{\"host\":\"lb-140-82-114-14-iad.github.com\",\"input\":\"github.com\",\"source\":\"hackertarget\"}\n" +
"{\"host\":\"out-2.smtp.github.com\",\"input\":\"github.com\",\"source\":\"hackertarget\"}\n" +
"{\"host\":\"stars.github.com\",\"input\":\"github.com\",\"source\":\"hackertarget\"}\n" +
"{\"host\":\"ns2.github.com\",\"input\":\"github.com\",\"source\":\"dnsdumpster\"}\n" +
"{\"host\":\"embed.github.com\",\"input\":\"github.com\",\"source\":\"alienvault\"}\n" +
"{\"host\":\"collector-cdn.github.com\",\"input\":\"github.com\",\"source\":\"hackertarget\"}\n" +
"{\"host\":\"lb-140-82-112-4-iad.github.com\",\"input\":\"github.com\",\"source\":\"hackertarget\"}\n" +
"{\"host\":\"lb-140-82-114-12-iad.github.com\",\"input\":\"github.com\",\"source\":\"hackertarget\"}\n" +
"{\"host\":\"lb-140-82-114-6-iad.github.com\",\"input\":\"github.com\",\"source\":\"hackertarget\"}\n" +
"{\"host\":\"lb-140-82-114-9-iad.github.com\",\"input\":\"github.com\",\"source\":\"hackertarget\"}\n" +
"{\"host\":\"m.communication.github.com\",\"input\":\"github.com\",\"source\":\"crtsh\"}\n" +
"{\"host\":\"github.com\",\"input\":\"github.com\",\"source\":\"digitorus\"}\n" +
"{\"host\":\"t.communication.github.com\",\"input\":\"github.com\",\"source\":\"crtsh\"}\n" +
"{\"host\":\"slack.github.com\",\"input\":\"github.com\",\"source\":\"anubis\"}\n" +
"{\"host\":\"out-3.smtp.github.com\",\"input\":\"github.com\",\"source\":\"hackertarget\"}\n" +
"{\"host\":\"community.github.com\",\"input\":\"github.com\",\"source\":\"anubis\"}\n";
List<String> response = RegexUtil.findMatchedRegexFromStrText(outputText, "\"([a-zA-Z0-9-]+\\.github\\.com)\"");
System.out.println(response);
返回:
["lb-140-82-114-14-iad.github.com", "embed.github.com", "collector-cdn.github.com", "lb-140-82-114-6-iad.github.com", "community.github.com", "lb-140-82-114-12-iad.github.com", "slack.github.com", "pkg.github.com", "lb-140-82-112-4-iad.github.com", "lb-140-82-114-9-iad.github.com", "stars.github.com", "ns2.github.com"]
它应该过滤掉不好的名称子域,例如:lb-140-82-114-14-iad.github.com、lb-140-82-114-12-iad.github.com 或 out-2.smtp.github .com。它应该返回正确的子域,例如 slack.github.com、pkg.github.com。
我可以使用什么正则表达式或者我可以采用什么方法?
我同时尝试了不同的正则表达式和多个正则表达式,但大多数情况下它返回空白响应。我需要该正则表达式来过滤不良子域并仅返回正确命名的子域。
尝试这个正则表达式
/^(lb|out)-\d+-\d+-\d+-\d+-iad\.github\.com$/