BeautifulSoup:合并表并导出到.csv

问题描述 投票:0回答:1

我一直在尝试从不同的URL下载数据,然后将其保存到csv文件中。

这个想法是从以下方面提取年度/季度数据:https://www.marketwatch.com/investing/stock/MMM/financials/

年度:

https://www.marketwatch.com/investing/stock/MMM/financials/cash-flow

enter image description here

季度

https://www.marketwatch.com/investing/stock/MMM/financials/cash-flow/quarter

Quarter

使用以下代码:

 import requests
 import pandas as pd

    urls = ['https://www.marketwatch.com/investing/stock/AAPL/financials/cash-flow',
            'https://www.marketwatch.com/investing/stock/MMM/financials/cash-flow']


    def main(urls):
        with requests.Session() as req:
            goal = []
            for url in urls:
                r = req.get(url)
                df = pd.read_html(
                    r.content, match="Cash Dividends Paid - Total")[0].iloc[[0], 0:3]
                goal.append(df)
            new = pd.concat(goal)
            print(new)


    main(urls)

输出:output

我可以提取所需的信息(在示例中,[[2家公司的年度20152016),但仅提取一组(季度或年度)]

我想

合并

表格年度+季度为此,我在这段代码中认为:

import requests import pandas as pd from urllib.request import urlopen from bs4 import BeautifulSoup import csv html = urlopen('https://www.marketwatch.com/investing/stock/MMM/financials/') soup = BeautifulSoup(html, 'html.parser') ids = ['cash-flow','cash-flow/quarter'] with open("news.csv", "w", newline="", encoding='utf-8') as f_news: csv_news = csv.writer(f_news) csv_news.writerow(["A"]) for id in ids: a = soup.find("Cash Dividends Paid - Total", id=id) csv_news.writerow([a.text])

但是出现以下错误:

error

python python-3.x web-scraping beautifulsoup export-to-csv
1个回答
0
投票
BeautifulSoup元素不具有属性text,但是具有方法get_text()

csv_news.writerow([a.get_text()])

https://www.crummy.com/software/BeautifulSoup/bs4/doc/#get-text
© www.soinside.com 2019 - 2024. All rights reserved.