遍历URL的Dataframe列并解析出html标签

Question

这不应该太难，尽管我不知道，我敢打赌我犯了一个愚蠢的错误。

这是在单个链接上工作并返回zestimate的代码（req_headers变量防止抛出验证码）：

req_headers = {
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
    'accept-encoding': 'gzip, deflate, br',
    'accept-language': 'en-US,en;q=0.8',
    'upgrade-insecure-requests': '1',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'
}

link = 'https://www.zillow.com/homedetails/1404-Clearwing-Cir-Georgetown-TX-78626/121721750_zpid/'
test_soup = BeautifulSoup(requests.get(link, headers=req_headers).content, 'html.parser')
results = test_soup.select_one('h4:contains("Home value")').find_next('p').get_text(strip=True)
print(results)

这是我要开始工作的代码，并为每个链接返回zestimate并添加到新的dataframe列中，但是我得到了AttributeError: 'NoneType' object has no attribute 'find_next'（此外，假设我有不同zillow house链接的dataframe列）：

req_headers = {
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
    'accept-encoding': 'gzip, deflate, br',
    'accept-language': 'en-US,en;q=0.8',
    'upgrade-insecure-requests': '1',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'
}

for link in df['links']:
    test_soup = BeautifulSoup(requests.get(link, headers=req_headers).content, 'html.parser')
    results = test_soup.select_one('h4:contains("Home value")').find_next('p').get_text(strip=True)
    df['zestimate'] = results

感谢您的帮助。

Answer 1

我在dataframe列中的链接之前和之后都有一个空格：/。就是这样该代码工作正常。只是我的疏忽。谢谢大家

遍历URL的Dataframe列并解析出html标签

问题描述投票：0回答：1

1个回答

最新问题

遍历URL的Dataframe列并解析出html标签

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1