逐一检测 URL/链接

Question

所以我有一个系统可以扫描人们的消息中是否存在恶意链接，并且在测试错误时遇到了问题。

因此，当有人以标记或隐藏方式发送恶意网址或正常情况下一切正常，但当他像这样双重发送时： https://example.com/https://example.com

该消息不会被标记。这是非常令人担忧的。

这是我当前的网址重新表达：

def extract_links(text):
    # Define a regular expression pattern to match URLs
    url_pattern = r'(https?://\S+?)(?:\)|\s|$)'
    # Find all matches of URLs in the text
    matches = re.findall(url_pattern, text)

我希望它返回消息中所有网址的列表。

Answer 1

一个简单的解决方法可能是按

https://

分割输入文本，如下所示：

def extract_links(text):
    # Define a regular expression pattern to match URLs
    url_pattern = r'(https?://\S+?)(?:\)|\s|$)'
    # Find all matches of URLs in the text
    matches = []
    for splitted_text in text.split("https://"):
        splitted_text = "https://" + splitted_text
        matches.extend(re.findall(url_pattern, splitted_text))
    return matches

逐一检测 URL/链接

问题描述投票：0回答：1

1个回答

最新问题

逐一检测 URL/链接

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1