正则表达式搜索特定的文本结构

Question

我想在字符串中找到某个结构的所有结果，最好使用正则表达式。

要查找所有网址，可以使用

re.findall('https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+', decode)

然后它回来了

 'https://en.wikipedia.org'

我想要一个正则表达式字符串，它找到：

href="/wiki/*anything*"

Answer 1

OP：开头必须是href =“/ wiki / middle可以是任何东西，最终必须是”

st = "since-OP-did-not-provide-a-sample-string-34278234$'blahhh-okay-enough.href='/wiki/anything/everything/nothing'okay-bye"    
print(st[st.find('href'):st.rfind("'")+1])

OUTPUT：

href='/wiki/anything/everything/nothing'

编辑：

如果我们要解析html，我会选择BeautifulSoup。

from bs4 import BeautifulSoup

text = '''<a href='/wiki/anything/everything/nothing'><img src="/hp_imgjhg/411/1/f_1hj11_100u.jpg" alt="dyufg" />well wait now <a href='/wiki/hello/how-about-now/nothing'>'''
soup = BeautifulSoup(text, features="lxml")

for line in soup.find_all('a'):
    print("href =",line.attrs['href'])

OUTPUT：

href = /wiki/anything/everything/nothing
href = /wiki/hello/how-about-now/nothing

正则表达式搜索特定的文本结构

问题描述投票：-1回答：1

1个回答

最新问题

正则表达式搜索特定的文本结构

问题描述 投票：-1回答：1

1个回答

最新问题

问题描述投票：-1回答：1