在大型HTML文件python中使用漂亮的汤正则表达式

Question

我无法在任何地方找到这个具体问题的答案，而我自己也无法弄明白。

我有一个大的HTML文件，它是电子邮件的模板。我已经将其作为文本文件读取并将值保存在变量html_string中。我在包含语句的行中有几行

<span style="color: #ff0000;"> {column_1}</span>

<span style="color: #ff0000;">{column_2}</span>

其中{column_ *}部分将被其他一些值（如名称）替换。另一个问题建议使用诸如

    soup = BeautifulSoup(html_string, features="html5lib")

    target = soup.find_all(text=re.compile('^{column_$'))

    print("Target:")
    print(target)

    for v in target:
        # do something (this never gets accessed due to empty list)

返回

   >>Target:
   >> []

而我希望它会返回{column_ *}的位置列表或者我可以用来插入我自己的字符串的其他东西。

我已经为re.compile（x）部分尝试了几种不同的结构，但没有任何工作。

任何帮助将不胜感激！

编辑------由于某种原因即使我导入了bs4，只有findAll函数会执行我所需要的 - 通常建议不要使用它作为bs4中的find_all将“做同样的事情”¬（.. ）¬

    soup = BeautifulSoup(html_string, features="html5lib")
    target = soup.findAll(text=re.compile('{column_.}'))

    for v in target:
        v.replace_with(dictionary[str(v)])

    body = str(soup)

Answer 1

您可以使用正则表达式来查找模板，并使用所需的值替换文本：

import re
vals = {'column_1':'Name', 'column_2':'Age'}
result = re.sub('\{.*?\}', lambda x:vals[x.group()[1:-1]], content)
print(result)

输出：

<span style="color: #ff0000;"> Name</span>

<span style="color: #ff0000;">Age</span>

Answer 2

你也可以用dict吗？

html  = '''
<span style="color: #ff0000;">column_1</span>
<span style="color: #ff0000;">column_2</span>
'''
soup = bs(html, 'lxml')
dict = {'column_1':'Name', 'column_2':'Age'}

for item in soup.select('[style="color: #ff0000;"]'):
    try:
        item.string = dict[item.text]
    except:
        continue
print(soup)

在大型HTML文件python中使用漂亮的汤正则表达式

问题描述投票：0回答：2

2个回答

最新问题

在大型HTML文件python中使用漂亮的汤正则表达式

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2