从 Confluence 中提取表内的数据

问题描述 投票:0回答:1

我正在尝试从汇合页面中提取表格并使用以下代码写入数据框。

我收到了 401 未经授权的错误,原因可能与访问有关,无论如何我想确保代码是否干净

from atlassian import Confluence
import os
from bs4 import BeautifulSoup
import pandas as pd

user = "user_name"
api_key = os.environ['OEwfen9FFrerGreer5GRrrdfd']
server = "https://confluence.abc.com/display/Int/%5new_variable"

confluence = Confluence(url=server, username=user, password=api_key)
page = confluence.get_page_by_title("TEST", "page 1", expand="body.storage")
body = page["body"]["storage"]["value"]

tables_raw = [[[cell.text for cell in row("th") + row("td")]
                    for row in table("tr")]
                    for table in BeautifulSoup(body, features="lxml")("table")]

tables_df = [pd.DataFrame(table) for table in tables_raw]
for table_df in tables_df:
    print(table_df)
confluence atlassian-python-api
1个回答
0
投票

假设您正在使用 Confluence Cloud

from atlassian import Confluence
import io from StringIO
import pandas as pd


api_key = 'OEwfen9FFrerGreer5GRrrdfd' #PAT you get from your account, also dont share this key
server = "https://confluence.abc.com/" #Just the base link in my case it was host.com/confluence

confluence = Confluence(url=server, token=api_key)
page = confluence.get_page_by_title("TEST", "page 1", expand="body.storage")
body = page["body"]["storage"]["value"]

df = pd.read_html(body) 
#if you want links do this instead
df = pd.read_html(body, links="all")

然后根据需要处理数据框

Confluence API 文档

Pandas 文档

阅读文档,90% 的情况下他们都会得到你的答案

© www.soinside.com 2019 - 2024. All rights reserved.