如何从网页中抓取参展商名称和描述

问题描述 投票:0回答:1

我想从此链接中删除所有参展商名称和信息:https://asiatechxsg.com/exhibitors/ 写入 csv 文件。

我写了这个:

html = requests.get('https://asiatechxsg.com/exhibitors/').text
bs = BeautifulSoup(html)
exhibitor_links = []
for link in bs.find_all('a'):
    if link.has_attr('href'):
        exhibitor_links.append(link.attrs['href'])
        print(link.attrs['href'])

但这些链接不是参展商链接。假设链接正确,我将迭代参展商链接中的每个链接,并从中提取名称和信息并将其存储到数据框中。但是,我也不清楚如何提取姓名和信息。我是网络抓取新手。预先感谢您提供的任何帮助。

python web-scraping beautifulsoup
1个回答
0
投票

你可以尝试:

import pandas as pd
import requests

url = "https://attend.informatechevents.virtual.informatech.com/api/graphql"


payload = {
    "extensions": {
        "persistedQuery": {
            "sha256Hash": "a717703fa8924575e04c9968ef2f441781e9cb8e2d5ca62d9ca9742bd04eac93",
            "version": 1,
        }
    },
    "operationName": "EventExhibitorListViewConnectionQuery",
    "variables": {
        "eventId": "RXZlbnRfMTc5MDkyMQ==",  # Event_1790921 in Base64
        "viewId": "RXZlbnRWaWV3Xzc2MDczMA==",  # EventView_760730 in Base64
        "withEvent": True,
    },
}
# WzAuMDA1NDg5MzQ5NCwibWFyaXRpbWUgYW5kIHBvcnQgYXV0aG9yaXR5IG9mIHNpbmdhcG9yZSJd

page, all_data = 1, []
while True:
    print(page)
    page += 1

    data = requests.post(url, json=payload).json()
    pi = data["data"]["view"]["exhibitors"]["pageInfo"]
    all_data.extend(data["data"]["view"]["exhibitors"]["nodes"])

    if not pi["hasNextPage"]:
        break

    payload["variables"]["endCursor"] = pi["endCursor"]

df = pd.DataFrame(all_data)
print(df.head(10))

打印:

                         id                               name                        type                                                                          logoUrl                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               htmlDescription                                                                                                       withEvent      __typename isBookmarked
0  RXhoaWJpdG9yXzE3MDk1NTI=                                AWS  Preferred Industry Partner                                                                             None                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          None                                  {'booth': None, '__typename': 'Core_ExhibitorWithEvent', 'isBookmarked': None}  Core_Exhibitor         None
1  RXhoaWJpdG9yXzE2NzY4MDQ=                             Google  Preferred Industry Partner  https://cdn-api.swapcard.com/public/images/3f2ebb6098bf43efb035c0350aed7457.png                                                                                            <p><strong>About Google</strong></p><p>Google's mission is to organize the world's information and make it universally accessible and useful. Through products and platforms like Search, Maps, Gmail, Android, Google Play, Google Cloud, Chrome and YouTube, Google plays a meaningful role in the daily lives of billions of people and has become one of the most widely-known companies in the world. Google is a subsidiary of Alphabet Inc.</p><br /><p><strong>About Alphabet Inc.</strong> Alphabet is a collection of companies, the largest of which is Google. Larry Page and Sergey Brin founded Google in September 1998 and the company is headquartered in Mountain View, Calif. Billions of people use its wide range of popular products and platforms each day, like Search, Ads, Chrome, Cloud, YouTube and Android.</p><br />                                  {'booth': None, '__typename': 'Core_ExhibitorWithEvent', 'isBookmarked': None}  Core_Exhibitor         None
2  RXhoaWJpdG9yXzE2NzY4NTY=     Blackmagic Design Asia Pte Ltd              Gold + Partner  https://cdn-api.swapcard.com/public/images/facf440096ad4704aa4c2657e0450570.png                                                                                <p>Blackmagic Design has grown rapidly to become one of the world's leading innovators and manufacturers of creative video technology. And that's because our philosophy is refreshing and simple - to help true creativity blossom.</p><p>Blackmagic Design's founders have had a long history in post-production editing and engineering. With extensive experiences in high-end telecine, film and post, harnessed with a real passion for perfection, Blackmagic set out to change the industry forever.</p><p>A company dedicated to quality and stability and focusing on where it's needed most; Blackmagic has created some of the most talked about products in the industry. World famous for their unbeatable codecs, Blackmagic envisioned truly affordable high-end quality editing workstations built upon Blackmagic software and hardware.</p>                               {'booth': '6K2-4', '__typename': 'Core_ExhibitorWithEvent', 'isBookmarked': None}  Core_Exhibitor         None
3  RXhoaWJpdG9yXzE2OTU1Njg=                                IBM              Gold + Partner  https://static.swapcard.com/public/images/738af3541aec43269d9cca179f0aee19.jpeg  <p>IBM is a leading provider of global hybrid cloud and AI, and consulting expertise. We help clients in more than 175 countries capitalize on insights from their data, streamline business processes, reduce costs and gain the competitive edge in their industries. More than 4,000 government and corporate entities in critical infrastructure areas such as financial services, telecommunications and healthcare rely on IBM's hybrid cloud platform and Red Hat OpenShift to affect their digital transformations quickly, efficiently and securely. IBM's breakthrough innovations in AI, quantum computing, industry-specific cloud solutions and consulting deliver open and flexible options to our clients. All of this is backed by IBM's long-standing commitment to trust, transparency, responsibility, inclusivity and service.</p><p>Visit<a href="https://www.ibm.com/us-en"> www.ibm.com</a> for more information.</p>                               {'booth': '5K2-7', '__typename': 'Core_ExhibitorWithEvent', 'isBookmarked': None}  Core_Exhibitor         None
4  RXhoaWJpdG9yXzE2NzY3Mzg=                           INTELSAT              Gold + Partner  https://cdn-api.swapcard.com/public/images/82053919394f43bb94786f100d8f0007.png                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          None                               {'booth': '5E2-1', '__typename': 'Core_ExhibitorWithEvent', 'isBookmarked': None}  Core_Exhibitor         None
5  RXhoaWJpdG9yXzE2NzY3ODQ=   Kacific Broadband Satellites Ltd              Gold + Partner  https://cdn-api.swapcard.com/public/images/267377b1ff994ad79fdad10838797d59.png                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   <p>Kacific is a next-generation broadband satellite operator. We are committed to providing universal, fast, high-quality broadband access at an affordable cost using robust technologies and an agile business model.</p>                               {'booth': '5H2-1', '__typename': 'Core_ExhibitorWithEvent', 'isBookmarked': None}  Core_Exhibitor         None
6  RXhoaWJpdG9yXzE2NzY3NTE=                       ManageEngine              Gold + Partner  https://cdn-api.swapcard.com/public/images/3bec928ffcde4cb5845f91e808ea5241.png                                                                                                  <p>ManageEngine is the enterprise IT management division of  Zoho Corporation. Established and emerging enterprises – including 9 of every 10 Fortune 100 organizations - rely on our real-time IT management tools to ensure optimal performance of their IT infrastructure, including networks, servers, applications, desktops and more. Our 90+ products and free tools cover everything your IT needs, at prices you can afford. From network and device management to security and service desk software, we're bringing IT together for an integrated, overarching approach to optimize your IT. We have offices worldwide, including the United States, the Netherlands, India, Singapore, Japan, China, and Australia as well as a network of 200+ global partners to help organizations tightly align their businesses and IT.</p>                               {'booth': '4J2-4', '__typename': 'Core_ExhibitorWithEvent', 'isBookmarked': None}  Core_Exhibitor         None
7  RXhoaWJpdG9yXzE2NzY4NDA=  SES World Skies Singapore Pte Ltd              Gold + Partner  https://static.swapcard.com/public/images/2fc999e98a634e2993dabdd3b74230e4.jpeg                        <p>As the leader in global content connectivity solutions, <strong>SES operates the world’s only multi-orbit constellation of satellites with the unique combination of global coverage and high performance, including the commercially-proven, low-latency Medium Earth Orbit O3b system. </strong></p><p>By leveraging a vast and intelligent, cloud-enabled network, SES is able to deliver high-quality connectivity solutions anywhere on land, at sea or in the air, and is a trusted partner to the world’s leading telecommunications companies, mobile network operators, governments, connectivity and cloud service providers, broadcasters, video platform operators and content owners. </p><p>SES’s video network carries over <strong>~8,200 channels and has an unparalleled reach of 369 million households</strong>, delivering managed media services for both linear and non-linear content. </p>                               {'booth': '5D1-7', '__typename': 'Core_ExhibitorWithEvent', 'isBookmarked': None}  Core_Exhibitor         None
8  RXhoaWJpdG9yXzE2NzY3Njc=             ST ENGINEERING IDIRECT              Gold + Partner  https://cdn-api.swapcard.com/public/images/504945779cfd429d93559577d64fd028.png   <p>ST Engineering iDirect is a global leader in satellite communications (satcom) providing technology and solutions that enable its customers to expand their business, differentiate their services and optimize their satcom networks. With over 40 years of delivering innovation focused on solving satellite’s most critical economic and technology challenges we are committed to shaping the future of how the world connects. The product portfolio, branded iDirect, represents the highest standards in performance, efficiency and reliability, making it possible for its customers to deliver the best satcom connectivity experience anywhere in the world. ST Engineering iDirect is a leader in key industries including mobility, broadcast and military/government. In 2007, iDirect Government was formed to better serve the U.S. government and defense communities. For more information visit www.idirect.net.</p>  {'booth': '5F3-1, Peridot 203, Tourmaline 208', '__typename': 'Core_ExhibitorWithEvent', 'isBookmarked': None}  Core_Exhibitor         None
9  RXhoaWJpdG9yXzE2NzY3NzM=      APT SATELLITE COMPANY LIMITED              Silver Partner                                                                             None                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          None                               {'booth': '5B2-4', '__typename': 'Core_ExhibitorWithEvent', 'isBookmarked': None}  Core_Exhibitor         None

...
© www.soinside.com 2019 - 2024. All rights reserved.