如何从两个字典Python构建DataFrame

问题描述 投票:0回答:1

我正在尝试构建一个数据框,其中此尝试从data抓取了columndicts。 (我尝试使用pd.Series进行此操作,但我也一直遇到问题。)

import requests
import pandas as pd
from bs4 import BeautifulSoup

# get link and parse
page = requests.get('https://www.finviz.com/screener.ashx?v=111&ft=4')
soup = BeautifulSoup(page.text, 'html.parser')

# return 'Title's for each filter
# to be used as columns in dataframe
titles = soup.find_all('span', attrs={'class': 'screener-combo-title'})
title_list = []
for t in titles:
    t = t.stripped_strings
    t = ' '.join(t)
    title_list.append(t)
title_list = {k: v for k, v in enumerate(title_list)}

# finding filters-cells tag id's
# to be used to build url
filters = soup.find_all('select', attrs={'data-filter': True})
filter_list = []
for f in filters:
    filter_list.append(f.get('data-filter'))

# finding selectable values per cell
# to be used as data in dataframe
final_list = []
for f in filters:
    options = f.find_all('option', attrs={'value': True})
    option_list = []    # list needs to stay inside
    for option in options:
        if option['value'] != "":
            option_list.append(option['value'])
    final_list.append(option_list)
final_list = {k: v for k, v in enumerate(final_list)}


df = pd.DataFrame([final_list], columns=[title_list])
print(df)

此结果显示为TypeError: unhashable type: 'dict',示例如下(第一列不是索引):

Exchange    Index     ...
amex     s&p500     ...
nasd     djia
nyse
python pandas dataframe beautifulsoup
1个回答
1
投票

这里是尝试构建一个dict,其中key对应于过滤器值,而value对应于可能选择的列表。是否符合您的需求?

import requests
import pandas as pd
from bs4 import BeautifulSoup

# get link and parse
page = requests.get('https://www.finviz.com/screener.ashx?v=111&ft=4')
soup = BeautifulSoup(page.text, 'html.parser')

all_dict = {}
filters = soup.find_all('td', attrs={'class': 'filters-cells'})
for i in range(len(filters) // 2):
    i_title = 2 * i
    i_value = 2 * i + 1
    sct = filters[i_title].find_all('span', attrs={'class': 'screener-combo-title'})
    if len(sct)== 1:
        title = ' '.join(sct[0].stripped_strings)      
        values = [v.text for v in filters[i_value].find_all('option', attrs={'value': True}) if v.text]
        all_dict[title] = values

max_element = max([len(v) for v in all_dict.values()])
for k in all_dict:
    all_dict[k] = all_dict[k] + [''] * (max_element - len(all_dict[k]))
df = pd.DataFrame.from_dict(all_dict) 
© www.soinside.com 2019 - 2024. All rights reserved.