Pandas read_html 自动将任意一列转换为 str

Question

我一直在尝试从网站上抓取表格，但由于某种原因，Pandas 会自动将每一列转换为字符串，因此某些值变得完全无用。例如，

0.62

变为

。我已经尝试解决这个问题有一段时间了，但我在网上找不到任何东西。 ChatGPT 也没有帮助。

我对编码比较陌生，这是我第一次使用 Pandas，如果我犯了一些愚蠢的错误，请提前抱歉。

这是一小段代码，我基本上使用 Selenium 和 soup 来个性化所需的源，然后用 read_html 抓取它。

   table = soup.find("table", attrs={"id": 
   "stats_shooting"}) 
   df = pd.read_html(str(table))[0]

这是我一直在尝试抓取的网址，如果有人需要的话：

https://fbref.com/it/comp/11/shooting/Statistiche-di-Serie-A#all_stats_shooting

它的列是多索引的，但我已经解决了这个问题。预先感谢。

Answer 1

这里的主要问题似乎是一些小数带有

，一些小数带有

- 所以尝试使用以下参数引导

pandas.read_html()

：

thousands=None,
decimal=','

示例

import pandas as pd
import requests

pd.read_html(
    requests.get('https://fbref.com/it/comp/11/shooting/Statistiche-di-Serie-A#all_stats_shooting').text.replace('<!--','').replace('-->',''),
    attrs={'id':'stats_shooting'},
    header=1,
    thousands=None,
    decimal=','
)[0]

Pandas read_html 自动将任意一列转换为 str

问题描述投票：0回答：1

1个回答

示例

最新问题

Pandas read_html 自动将任意一列转换为 str

问题描述 投票：0回答：1

1个回答

示例

最新问题

问题描述投票：0回答：1