分析
- 您需要知道的事情
- 炒作周期
- 优先级矩阵
- 关闭炒作周期
- 正在兴起
- 算法业务] >
- 人群中的众包
- 基于信任的信息治理模型
- 信息经济学
- 自助服务数据和分析
- 分析治理
- 公民数据科学] >
- Data Hub Strategy
- Digital Ethics
] >> [在高峰期 - Data as a Service
Data分类 Decision Management] > 文件分析 数据目录 企业元数据管理 机器学习 语义虚拟数据层 业务能力建模 信息治理 开放数据 数据湖 企业信息管理程序 预测性分析 参考数据管理] >> [滑入槽
信息功能框架 < [信息体系结构
逻辑数据仓库 主数据管理 实体解析与分析] 爬坡消息数据的SaaS归档 企业信息归档 进入高原数据质量工具 附录] >>
Gartner推荐读物# ----I tried using this code below
from bs4 import BeautifulSoup as soup
import csv
import re
from urllib.request import urlopen as uReq
my_url = 'https://www.gartner.com/en/documents/3777063'
my_url
page_soup=soup(page_html,"html.parser")
containers=page_soup.find('div',class_='table-of-content')
x=[]
for tag in containers.ul.findAll("li"):
x.append(tag.text)
print(x)
#---- my answer was this
#['Algorithmic Business', 'Human-in-the-Loop Crowdsourcing', 'Trust-Based
[信息治理模型”,“信息经济学”,#“自助数据和分析”,“分析治理”,#“公民数据科学”,“数据中心策略”,“数字伦理”]##但我希望其他列具有类似的数据-“ At the Peak”,#“ Sliding into the槽”,依此类推/如何做?
[您需要了解的炒作周期,炒作周期之外的优先级矩阵,上升的算法业务,基于人群的众包,基于信任的信息治理模型,InfonomicsSelf -... 如果我理解正确,您想将所有
<li>
提取到列表中。您可以使用
.find_next(text=True)
方法:
data = '''<div class="table-of-content">
<p>Analysis</p><ul><li>What You Need to Know</li><li>The Hype Cycle</li><li>The Priority Matrix</li><li>Off the Hype Cycle</li><li>On the Rise<ul><li>Algorithmic Business</li><li>Human-in-the-Loop Crowdsourcing</li><li>Trust-Based Information Governance Models</li><li>Infonomics</li><li>Self-Service Data & Analytics</li><li>Analytics Governance</li><li>Citizen Data Science</li><li>Data Hub Strategy</li><li>Digital Ethics</li></ul></li><li>At the Peak<ul><li>Graph Analytics</li><li>Data as a Service</li><li>Data Classification</li><li>Decision Management</li><li>File Analysis</li><li>Data Catalog</li><li>Enterprise Metadata Management</li><li>Machine Learning</li><li>Semantic Virtual Data Tier</li><li>Business Capability Modeling</li><li>Information Governance</li><li>Open Data</li><li>Data Lakes</li><li>Enterprise Information Management Programs</li><li>Predictive Analytics</li><li>Reference Data Management</li></ul></li><li>Sliding Into the Trough<ul><li>Information Capabilities Framework</li><li>Information Architecture</li><li>Application Data Management</li><li>Information Stewardship</li><li>Logical Data Warehouse</li><li>Master Data Management</li><li>Entity Resolution and Analysis</li></ul></li><li>Climbing the Slope<ul><li>SaaS Archiving of Messaging Data</li><li>Enterprise Information Archiving</li><li>Content Migration</li></ul></li><li>Entering the Plateau<ul><li>Data Quality Tools</li></ul></li><li>Appendixes<ul><li>Hype Cycle Phases, Benefit Ratings and Maturity Levels</li></ul></li></ul><p>Gartner Recommended Reading</p>
</div>'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(data, 'html.parser')
lst = [li.find_next(text=True) for li in soup.select('li')]
print(*lst, sep='\n')
打印:
What You Need to Know
The Hype Cycle
The Priority Matrix
Off the Hype Cycle
On the Rise
Algorithmic Business
Human-in-the-Loop Crowdsourcing
Trust-Based Information Governance Models
Infonomics
Self-Service Data & Analytics
Analytics Governance
Citizen Data Science
Data Hub Strategy
Digital Ethics
At the Peak
Graph Analytics
Data as a Service
Data Classification
Decision Management
File Analysis
Data Catalog
Enterprise Metadata Management
Machine Learning
Semantic Virtual Data Tier
Business Capability Modeling
Information Governance
Open Data
Data Lakes
Enterprise Information Management Programs
Predictive Analytics
Reference Data Management
Sliding Into the Trough
Information Capabilities Framework
Information Architecture
Application Data Management
Information Stewardship
Logical Data Warehouse
Master Data Management
Entity Resolution and Analysis
Climbing the Slope
SaaS Archiving of Messaging Data
Enterprise Information Archiving
Content Migration
Entering the Plateau
Data Quality Tools
Appendixes
Hype Cycle Phases, Benefit Ratings and Maturity Levels