使用python抓取数据并请求html并导出到excel文件中

Question

我编写了一个代码来从网站上抓取数据，它工作正常，但我想将它们导出到 Excel 文件中。

我是Python新手，所以我不知道我到底应该做什么。

我想过

pandas

，但我的输出是带有

join

的打印，所以我没有找到好的解决方案。

这是我的代码：

from requests_html import HTMLSession
import pandas as pd
import tabulate
from tabulate import tabulate
 
matchlink = 'https://www.betexplorer.com/football/serbia/prva-liga/results/'
 
session = HTMLSession()
 
r = session.get(matchlink)

allmatch = r.html.find('.in-match')
results = r.html.find('.h-text-center a')
# search for elements containing "data-odd" attribute
matchodds = r.html.find('[data-odd]')

odds = [matchodd.attrs["data-odd"] for matchodd in matchodds]

idx = 0
for match, res in zip(allmatch, results):
    if res.text == 'POSTP.':
        continue

    print(f"{match.text} {res.text} {', '.join(odds[idx:idx+3])}")
    
    idx += 3

感谢您的帮助

Answer 1

当然，使用熊猫！

这是一些示例输出。有了dataframe在手，调用就方便了 .to_csv(), .to_excel(), 或者其他什么。

                                    result              odds
match                                                       
Dubocica - FK Indjija                  2:1  2.18, 2.93, 3.31
Mladost GAT - Smederevo                1:1  1.63, 3.37, 5.17
Graficar Beograd - RFK Novi Sad        2:1  1.41, 4.31, 6.28
Tekstilac Odzaci - Radnicki Beograd    5:0  1.53, 3.79, 5.49
FK Indjija - Vrsac                     2:1  1.72, 3.16, 4.90
...                                    ...               ...
Jedinstvo U. - RFK Novi Sad            4:0  1.45, 4.42, 5.59
Metalac - Graficar Beograd             1:3  2.17, 3.14, 3.11
Sloboda - OFK Beograd                  0:2  1.87, 3.15, 4.02
Smederevo - FK Indjija                 2:0  2.76, 2.83, 2.59
Vrsac - Kolubara                       1:0  2.73, 2.92, 2.57

[160 rows x 2 columns]

我只是在你的代码周围放置了一个简单的包装。（顺便说一句，关于

tabulate

，如果您

from x import x

然后

import x

，这会撤销你想要做的事情。）

from typing import Generator

from requests_html import HTMLSession
import pandas as pd

matchlink = "https://www.betexplorer.com/football/serbia/prva-liga/results/"


def _get_rows(url: str) -> Generator[dict[str, str], None, None]:
    session = HTMLSession()

    r = session.get(matchlink)

    allmatch = r.html.find(".in-match")
    results = r.html.find(".h-text-center a")
    # search for elements containing "data-odd" attribute
    matchodds = r.html.find("[data-odd]")

    odds = [matchodd.attrs["data-odd"] for matchodd in matchodds]

    idx = 0
    for match, res in zip(allmatch, results):
        if res.text == "POSTP.":
            continue

        print(f"{match.text} Z {res.text} {', '.join(odds[idx:idx+3])}")
        yield {
            "match": match.text,
            "result": res.text,
            "odds": ", ".join(odds[idx : idx + 3]),
        }

        idx += 3


if __name__ == "__main__":
    df = pd.DataFrame(_get_rows(matchlink)).set_index("match")
    print(df)

使用python抓取数据并请求html并导出到excel文件中

问题描述投票：0回答：1

1个回答

最新问题

使用python抓取数据并请求html并导出到excel文件中

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1