如何单独提取所有 并使用Beautifulsoup提取表格列表?

问题描述 投票:-2回答:1

分析

  • 您需要知道的事情
  • 炒作周期
  • 优先级矩阵
  • 关闭炒作周期
  • 正在兴起
    • 算法业务] >
    • 人群中的众包
    • 基于信任的信息治理模型
    • 信息经济学
    • 自助服务数据和分析
    • 分析治理
    • 公民数据科学] >
    • Data Hub Strategy
    • Digital Ethics
  • ] >> [在高峰期
    • Graph Analytics
  • Data as a Service
  • Data分类
  • Decision Management] >
  • 文件分析
  • 数据目录
  • 企业元数据管理
  • 机器学习
  • 语义虚拟数据层
  • 业务能力建模
  • 信息治理
  • 开放数据
  • 数据湖
  • 企业信息管理程序
  • 预测性分析
  • 参考数据管理
  • ] >> [滑入槽
  • 信息功能框架
  • < [信息体系结构
      应用程序数据管理
    • 信息管理
  • 逻辑数据仓库
  • 主数据管理
  • 实体解析与分析]
  • 爬坡
  • 消息数据的SaaS归档
  • 企业信息归档
    • 内容迁移
  • 进入高原
  • 数据质量工具
  • 附录
    • 炒作周期阶段,收益等级和成熟度级别
  • ] >>
  • Gartner推荐读物
      # ----I tried using this code below from bs4 import BeautifulSoup as soup import csv import re from urllib.request import urlopen as uReq my_url = 'https://www.gartner.com/en/documents/3777063' my_url page_soup=soup(page_html,"html.parser") containers=page_soup.find('div',class_='table-of-content') x=[] for tag in containers.ul.findAll("li"): x.append(tag.text) print(x) #---- my answer was this #['Algorithmic Business', 'Human-in-the-Loop Crowdsourcing', 'Trust-Based
  • [信息治理模型”,“信息经济学”,#“自助数据和分析”,“分析治理”,#“公民数据科学”,“数据中心策略”,“数字伦理”]
  • ##但我希望其他列具有类似的数据-“ At the Peak”,#“ Sliding into the槽”,依此类推/如何做?

    [您需要了解的炒作周期,炒作周期之外的优先级矩阵,上升的算法业务,基于人群的众包,基于信任的信息治理模型,InfonomicsSelf -... 

    如果我理解正确,您想将所有<li>提取到列表中。您可以使用.find_next(text=True)方法:
    data = '''<div class="table-of-content"> <p>Analysis</p><ul><li>What You Need to Know</li><li>The Hype Cycle</li><li>The Priority Matrix</li><li>Off the Hype Cycle</li><li>On the Rise<ul><li>Algorithmic Business</li><li>Human-in-the-Loop Crowdsourcing</li><li>Trust-Based Information Governance Models</li><li>Infonomics</li><li>Self-Service Data &amp; Analytics</li><li>Analytics Governance</li><li>Citizen Data Science</li><li>Data Hub Strategy</li><li>Digital Ethics</li></ul></li><li>At the Peak<ul><li>Graph Analytics</li><li>Data as a Service</li><li>Data Classification</li><li>Decision Management</li><li>File Analysis</li><li>Data Catalog</li><li>Enterprise Metadata Management</li><li>Machine Learning</li><li>Semantic Virtual Data Tier</li><li>Business Capability Modeling</li><li>Information Governance</li><li>Open Data</li><li>Data Lakes</li><li>Enterprise Information Management Programs</li><li>Predictive Analytics</li><li>Reference Data Management</li></ul></li><li>Sliding Into the Trough<ul><li>Information Capabilities Framework</li><li>Information Architecture</li><li>Application Data Management</li><li>Information Stewardship</li><li>Logical Data Warehouse</li><li>Master Data Management</li><li>Entity Resolution and Analysis</li></ul></li><li>Climbing the Slope<ul><li>SaaS Archiving of Messaging Data</li><li>Enterprise Information Archiving</li><li>Content Migration</li></ul></li><li>Entering the Plateau<ul><li>Data Quality Tools</li></ul></li><li>Appendixes<ul><li>Hype Cycle Phases, Benefit Ratings and Maturity Levels</li></ul></li></ul><p>Gartner Recommended Reading</p> </div>''' from bs4 import BeautifulSoup soup = BeautifulSoup(data, 'html.parser') lst = [li.find_next(text=True) for li in soup.select('li')] print(*lst, sep='\n')
    打印:

    What You Need to Know The Hype Cycle The Priority Matrix Off the Hype Cycle On the Rise Algorithmic Business Human-in-the-Loop Crowdsourcing Trust-Based Information Governance Models Infonomics Self-Service Data & Analytics Analytics Governance Citizen Data Science Data Hub Strategy Digital Ethics At the Peak Graph Analytics Data as a Service Data Classification Decision Management File Analysis Data Catalog Enterprise Metadata Management Machine Learning Semantic Virtual Data Tier Business Capability Modeling Information Governance Open Data Data Lakes Enterprise Information Management Programs Predictive Analytics Reference Data Management Sliding Into the Trough Information Capabilities Framework Information Architecture Application Data Management Information Stewardship Logical Data Warehouse Master Data Management Entity Resolution and Analysis Climbing the Slope SaaS Archiving of Messaging Data Enterprise Information Archiving Content Migration Entering the Plateau Data Quality Tools Appendixes Hype Cycle Phases, Benefit Ratings and Maturity Levels

    python html list beautifulsoup findall
    1个回答
    0
    投票
    打印:

    What You Need to Know The Hype Cycle The Priority Matrix Off the Hype Cycle On the Rise Algorithmic Business Human-in-the-Loop Crowdsourcing Trust-Based Information Governance Models Infonomics Self-Service Data & Analytics Analytics Governance Citizen Data Science Data Hub Strategy Digital Ethics At the Peak Graph Analytics Data as a Service Data Classification Decision Management File Analysis Data Catalog Enterprise Metadata Management Machine Learning Semantic Virtual Data Tier Business Capability Modeling Information Governance Open Data Data Lakes Enterprise Information Management Programs Predictive Analytics Reference Data Management Sliding Into the Trough Information Capabilities Framework Information Architecture Application Data Management Information Stewardship Logical Data Warehouse Master Data Management Entity Resolution and Analysis Climbing the Slope SaaS Archiving of Messaging Data Enterprise Information Archiving Content Migration Entering the Plateau Data Quality Tools Appendixes Hype Cycle Phases, Benefit Ratings and Maturity Levels

    © www.soinside.com 2019 - 2024. All rights reserved.