这个对我来说比较棘手。我正在尝试从python的Google表格中提取嵌入式表格。
这里是link
我不拥有该工作表,但可以公开获得。
到目前为止,这是我的代码,当我去输出标题时,它显示给我“”。任何帮助将不胜感激。最终目标是将此表转换为熊猫DF。谢谢你们
import lxml.html as lh
import pandas as pd
url = 'https://docs.google.com/spreadsheets/u/0/d/e/2PACX-1vQ--HR_GTaiv2dxaVwIwWYzY2fXTSJJN0dugyQe_QJnZEpKm7bu5o7eh6javLIk2zj0qtnvjJPOyvu2/pubhtml/sheet?headers=false&gid=1503072727'
page = requests.get(url)
doc = lh.fromstring(page.content)
tr_elements = doc.xpath('//tr')
col = []
i = 0
for t in tr_elements[0]:
i +=1
name = t.text_content()
print('%d:"%s"'%(i,name))
col.append((name,[]))
好吧,如果您希望将数据放入DataFrame中,则可以直接加载它:
df = pd.read_html('https://docs.google.com/spreadsheets/u/0/d/e/2PACX-1vQ--HR_GTaiv2dxaVwIwWYzY2fXTSJJN0dugyQe_QJnZEpKm7bu5o7eh6javLIk2zj0qtnvjJPOyvu2/pubhtml/sheet?headers=false&gid=1503072727',
header=1)[0]
df.drop(columns='1', inplace=True) # remove unnecessary index column called "1"
这会给您:
Target Ticker Acquirer \
0 Acacia Communications Inc Com ACIA Cisco Systems Inc Com
1 Advanced Disposal Services Inc Com ADSW Waste Management Inc Com
2 Allergan Plc Com AGN Abbvie Inc Com
3 Ak Steel Holding Corp Com AKS Cleveland Cliffs Inc Com
4 Td Ameritrade Holding Corp Com AMTD Schwab (Charles) Corp Com
Ticker.1 Current Price Take Over Price Price Diff % Diff Date Announced \
0 CSCO $68.79 $70.00 $1.21 1.76% 7/9/2019
1 WM $32.93 $33.15 $0.22 0.67% 4/15/2019
2 ABBV $197.05 $200.22 $3.17 1.61% 6/25/2019
3 CLF $2.98 $3.02 $0.04 1.34% 12/3/2019
4 SCHW $49.31 $51.27 $1.96 3.97% 11/25/2019
Deal Type
0 Cash
1 Cash
2 C&S
3 Stock
4 Stock
注意read_html
返回列表。在这种情况下,只有1 DataFrame,因此我们可以引用第一个也是唯一的索引位置[0]