想要使用 Python 3 提取这两个 PHP 标签之间的所有 html div、内联 JS 等。
<?php elseif ($article->category_id == 18): ?>
<div class="cta-alt cloud">
<div class="container js-inView">
<h1>INTERESTED IN <strong>JOINING US?</strong>
</h1>
<p>We’re always looking for the best talent.</p>
<a href="https://www.mikrosanimation.com/en/people-and-culture/careers/" class="btn btn-fill">View open
roles</a>
</div>
</div>
<?php elseif ($article->category_id == 14): ?>
#-------------------------------------- 到目前为止,这些是我的正则表达式
x = open('C:\\file.phtml', encoding="utf-8").read()
r1 = r'<?php elseif ($article->category_id == 18): ?>(.*?)<?php elseif ($article->category_id == 14): ?>'
s2 = re.findall(r1,x)
print(s2)
使用
bs4
:
from bs4 import BeautifulSoup
file = open("stack.phtml", "r+", encoding="utf-8")
soup = BeautifulSoup(file)
div = soup.find("div")
print(div)
这会以 html 语法输出 div 标签:
<div class="cta-alt cloud">
<div class="container js-inView">
<h1>INTERESTED IN <strong>JOINING US?</strong>
</h1>
<p>We’re always looking for the best talent.</p>
<a href="https://www.mikrosanimation.com/en/people-and-culture/careers/" class="btn btn-fill">View open
roles</a>
</div>
</div>
但是,如果您只需要“字符串”或文本,则使用
print(div.text)
将打印:
INTERESTED IN JOINING US? We’re always looking for the best talent. View open roles