使用Python和正则表达式从phtml文件中提取字符串块

问题描述 投票:0回答:1

想要使用 Python 3 提取这两个 PHP 标签之间的所有 html div、内联 JS 等。

<?php elseif ($article->category_id == 18): ?>
<div class="cta-alt  cloud">
    <div class="container js-inView">
        <h1>INTERESTED IN <strong>JOINING US?</strong>
        </h1>
        <p>We’re always looking for the best talent.</p>

        <a href="https://www.mikrosanimation.com/en/people-and-culture/careers/" class="btn btn-fill">View open
            roles</a>
    </div>
</div>
<?php elseif ($article->category_id == 14): ?>

#-------------------------------------- 到目前为止,这些是我的正则表达式

x = open('C:\\file.phtml', encoding="utf-8").read()
r1 = r'<?php elseif ($article->category_id == 18): ?>(.*?)<?php elseif ($article->category_id == 14): ?>'
s2 = re.findall(r1,x)
print(s2)
python python-3.x regex
1个回答
0
投票

使用

bs4

from bs4 import BeautifulSoup
file = open("stack.phtml", "r+", encoding="utf-8")
soup = BeautifulSoup(file)
div = soup.find("div")
print(div)

这会以 html 语法输出 div 标签:

<div class="cta-alt  cloud">
    <div class="container js-inView">
        <h1>INTERESTED IN <strong>JOINING US?</strong>
        </h1>
        <p>We’re always looking for the best talent.</p>

        <a href="https://www.mikrosanimation.com/en/people-and-culture/careers/" class="btn btn-fill">View open
            roles</a>
    </div>
</div>

但是,如果您只需要“字符串”或文本,则使用

print(div.text)
将打印:

INTERESTED IN JOINING US?
We’re always looking for the best talent.
View open roles
© www.soinside.com 2019 - 2024. All rights reserved.