媒体链接解析器

问题描述 投票:0回答:0

我正在尝试从该站点下载链接并从中获取社交网络的链接,并对它们进行排序并将它们写在表格中(链接到 Twitter,链接到 Telegram ....)。我写了所有的代码,但它给出了一个错误。有什么问题吗?

import requests
import pandas as pd
from bs4 import BeautifulSoup
from IPython.display import display

urls = [
    "https://latium.org/services?category=featured",
    "http://www.businessinsider.com/",
    "https://unmarshal.io"
]

sm_sites = ["facebook.com", "vk.com", "twitter.com","t.me","instagram.com","discord.com","github.com","linkedin.com","reddit.com","www.tiktok.com"]
sm_sites_present = []
columns = ['url'] + sm_sites
df = pd.DataFrame(data={'url' : urls}, columns=columns)

def get_sm(row):
    r = requests.get(row['url'])
    output = pd.Series()

    soup = BeautifulSoup(r.content, 'html5lib')
    all_links = soup.find_all('a', href = True)
    for sm_site in sm_sites:
        for link in all_links:
            if sm_site in link.attrs['href']:
                output[sm_site] = link.attrs['href']
    return output

sm_columns = df.apply(get_sm, axis=1)
df.update(sm_columns)
df.fillna(value = 'no link')

但答案是

FeatureNotFound                           Traceback (most recent call last)
/var/folders/5l/tkf85tcj7tb12y9v832j_prm0000gn/T/ipykernel_875/1543778654.py in <module>
     22     return output
     23 
---> 24 sm_columns = df.apply(get_sm, axis=1)
     25 df.update(sm_columns)
     26 df.fillna(value = 'no link')
FeatureNotFound: Couldn't find a tree builder with the features you requested: html5lib. Do you need to install a parser library?
python sorting web-scraping python-social-auth spring-social-linkedin
© www.soinside.com 2019 - 2024. All rights reserved.