从还提到卧室数量的字符串中提取平方米的最佳方法是什么？

Question

我正在尝试提取：

<div class="xl-surface-ch"> 
                            &nbsp;84 m²  &nbsp;&nbsp;&nbsp;2 bed.  
                        </div>

来自link的问题是，我只需要在该字符串中输入“ 84”（有时它们也超过2或3位数字）。

增加的困难是有时没有提到平方米，看起来像这样：

<div class="xl-surface-ch"> 
                             &nbsp;&nbsp;&nbsp;2 bed.  
                        </div>

并且在那种情况下，我需要返回0

我最大的尝试是：

    sqm = []
for item in soup.findAll('div', attrs={'class': 'xl-surface-ch'}):
    item = item.contents[0].strip()[0:4]
    item_clean = re.findall("[0-9]{2,4}", item)
    sqm.append(item_clean)

print(sqm)

但是，这似乎不起作用，也完全不是我为上述最终结果所需要的。这是我的代码得到的结果：

[['84'], ['70'], ['80'], ['32'], ['149'], ['22'], ['75'], ['30'], ['23'], ['104'], [], ['95'], ['129'], ['26'], ['55'], ['26'], ['25'], ['28'], ['33'], ['210'], ['37'], ['69'], ['36'], ['19'], ['119'], ['20'], ['20'], ['129'], ['154'], ['25']]

您真的会对你们提供什么样的解决方案感兴趣，因为老实说我没有真正的解决方案，尤其是因为您有时拥有的建筑没有sqm ...也许带有if语句？我现在无论如何都要尝试。

谢谢你！

Answer 1

import requests
from bs4 import BeautifulSoup

r = requests.get(
    'https://www.immoweb.be/en/search/apartment/for-sale/leuven/3000')
soup = BeautifulSoup(r.text, 'html.parser')

for item in soup.findAll('div', attrs={'class': 'xl-surface-ch'}):
    item = item.text.strip()
    if 'm²' in item:
        print(item[0:item.find('m')])
    else:
        item = 0
        print(item)

输出：

从还提到卧室数量的字符串中提取平方米的最佳方法是什么？

问题描述投票：0回答：1

1个回答

最新问题

从还提到卧室数量的字符串中提取平方米的最佳方法是什么？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1