正则表达式以匹配完全限定的主机名或带有可选https的URL

Question

日志文件中包含2个可能的字符串：

1）"some text then https://myhost.ab.us2.myDomain.com and then some more text"

OR：

2）"some text then myhost.ab.us2.myDomain.com and then some more text"

"myDomain.com"是常量，因此我们可以在正则表达式中查找硬编码的内容。

在两种情况下，它们都不在行的开头，而是在中间。

如果匹配，则需要从行中提取"myhost"。

我已经尝试过使用"https://"或"\\s{1}"进行积极的展望。 https://本身可以工作：

Matcher m = Pattern.compile("https://(.+?)\\.(.+?)\\.(.+?)\\.myDomain\\.com\\s").matcher(input);

我想在其中添加一个“或”，以便与"https://"或"<space>"（"https://|//s{1}"）匹配，但是它始终会抓取整个字符串，直到第一个空格的开头。

现在，我决定将字符串拆分为String[]，然后检查它是否包含"myDomain"。我为此工作了很长时间，所以我想学习最好的答案是什么。

Answer 1

我只是采用非正则表达式方法：

public static String extractHost(String logEntry, String domain)
{

    logEntry = logEntry.toLowerCase(); -> not needed, just a hint to remember case sensitive stuff ;)

    if(logEntry.indexOf("https://") != -1)
    {
        // contains protocol, must be variant one
        return logEntry.substring(logEntry.indexOf("https://")+8,logEntry.indexOf("."));
    }

    //  has to be variant two
    int domainIndex = logEntry.indexOf(domain);

    if(domainIndex == -1) return null;

    int previousDotIndex = -1;

    for(int i = domainIndex; i>= 0; i--)
    {
        if(logEntry.charAt(i) == '.') previousDotIndex = i;
        if(logEntry.charAt(i) == ' ') return logEntry.substring(++i,previousDotIndex);
    }

    return null;
}

变体＃2实际上是更困难的一种，在这种方法中，您只需从域的索引迭代回到找到的第一个空格，并存储找到的最新点的位置。然后，这只是一个简单的子字符串。

Answer 2

我会用类似的东西

\b(?:https?:\/\/)?(\w+)\.(?:\w+\.)*myDomain\.com

这与可选的https://前缀匹配，后跟捕获的主机，然后是其他一些子域（如果您知道始终为{2}，则可以指定ab.us2的数量或对其进行硬编码），然后[ C0]。

在Java中：

myDomain.com

正则表达式以匹配完全限定的主机名或带有可选https的URL

问题描述投票：1回答：2

2个回答

最新问题

正则表达式以匹配完全限定的主机名或带有可选https的URL

问题描述 投票：1回答：2

2个回答

最新问题

问题描述投票：1回答：2