如何从交互式图形中提取数据

问题描述 投票:0回答:1

我需要从提供汇总轮询号的website中获取数据点。数据以交互式图形显示。我应该如何获取每个候选人的所有数据点(日期:数字对)?我试图分析和检查源代码,但是找不到它指向的数据文件。我将对使用Python或R的解决方案感到满意。将非常感谢您的帮助。

python r json web-scraping interactive
1个回答
1
投票

如上所述,在开发工具中找到API调用。然后,只需要获取响应并根据需要对其进行操作即可:

import requests
import pandas as pd
import json
import time


timestamp = str(int(time.time()*1000.0))

url ='https://www.realclearpolitics.com/epolls/json/6730_historical.js'

headers = {
'user-agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Mobile Safari/537.36'}

payload = {
timestamp: '',
'callback': 'return_json'}


jsonStr = requests.get(url, headers=headers, params=payload).text
jsonData = json.loads(jsonStr.split('(',1)[-1].rsplit(')',1)[0])

results = pd.DataFrame()
df = pd.DataFrame(jsonData['poll']['rcp_avg'])
for idx, row in df.iterrows():
    temp_df = pd.DataFrame(row['candidate'])
    temp_df['date'] = row['date']
    results = results.append(temp_df, sort=True).reset_index(drop=True)

输出:

print (results)
     affiliation    color                date  ...        name status value
0                 #009900 2019-11-28 06:00:00  ...       Biden      1  27.0
1                 #457fff 2019-11-28 06:00:00  ...     Sanders      1  18.3
2                 #996600 2019-11-28 06:00:00  ...      Warren      1  15.8
3                 #990099 2019-11-28 06:00:00  ...   Buttigieg      1  11.0
4                 #ff9900 2019-11-28 06:00:00  ...      Harris      1   3.8
5                 #3da882 2019-11-28 06:00:00  ...        Yang      1   3.3
6                 #f2dc0f 2019-11-28 06:00:00  ...   Bloomberg      1   2.5
7                 #000000 2019-11-28 06:00:00  ...   Klobuchar      1   2.2
8                 #66ccff 2019-11-28 06:00:00  ...      Booker      1   1.8
9                 #666666 2019-11-28 06:00:00  ...      Steyer      1   1.7
10                #ff0074 2019-11-28 06:00:00  ...     Gabbard      1   1.3
11                #cc9900 2019-11-28 06:00:00  ...      Castro      1   1.2
12                #9966ff 2019-11-28 06:00:00  ...      Bennet      1   0.6
13                #10671b 2019-11-28 06:00:00  ...     Bullock      3   0.4
14                #990000 2019-11-28 06:00:00  ...     Patrick      3   0.4
15                #6672ff 2019-11-28 06:00:00  ...      Sestak      3   0.3
16                #009900 2019-11-27 06:00:00  ...       Biden      1  28.2
17                #457fff 2019-11-27 06:00:00  ...     Sanders      1  17.8
18                #996600 2019-11-27 06:00:00  ...      Warren      1  16.7
19                #990099 2019-11-27 06:00:00  ...   Buttigieg      1  10.5
20                #ff9900 2019-11-27 06:00:00  ...      Harris      1   3.8
21                #3da882 2019-11-27 06:00:00  ...        Yang      1   3.2
22                #f2dc0f 2019-11-27 06:00:00  ...   Bloomberg      1   2.4
23                #000000 2019-11-27 06:00:00  ...   Klobuchar      1   2.0
24                #66ccff 2019-11-27 06:00:00  ...      Booker      1   1.7
25                #666666 2019-11-27 06:00:00  ...      Steyer      1   1.7
26                #ff0074 2019-11-27 06:00:00  ...     Gabbard      1   1.5
27                #cc9900 2019-11-27 06:00:00  ...      Castro      1   1.0
28                #9966ff 2019-11-27 06:00:00  ...      Bennet      1   0.8
29                #10671b 2019-11-27 06:00:00  ...     Bullock      3   0.4
         ...      ...                 ...  ...         ...    ...   ...
5650              #996600 2018-12-10 06:00:00  ...      Warren      1   6.0
5651              #990099 2018-12-10 06:00:00  ...   Buttigieg      1   NaN
5652              #ff9900 2018-12-10 06:00:00  ...      Harris      1   5.3
5653              #3da882 2018-12-10 06:00:00  ...        Yang      1   NaN
5654              #f2dc0f 2018-12-10 06:00:00  ...   Bloomberg      1   NaN
5655              #000000 2018-12-10 06:00:00  ...   Klobuchar      1   NaN
5656              #66ccff 2018-12-10 06:00:00  ...      Booker      1   4.0
5657              #666666 2018-12-10 06:00:00  ...      Steyer    NaN   NaN
5658              #ff0074 2018-12-10 06:00:00  ...     Gabbard      1   NaN
5659              #cc9900 2018-12-10 06:00:00  ...      Castro      1   NaN
5660              #9966ff 2018-12-10 06:00:00  ...      Bennet      1   NaN
5661              #10671b 2018-12-10 06:00:00  ...     Bullock      3   NaN
5662              #990000 2018-12-10 06:00:00  ...     Patrick    NaN   NaN
5663              #6672ff 2018-12-10 06:00:00  ...      Sestak    NaN   NaN
5664              #009900 2018-12-09 06:00:00  ...       Biden      1  29.0
5665              #457fff 2018-12-09 06:00:00  ...     Sanders      1  17.7
5666              #996600 2018-12-09 06:00:00  ...      Warren      1   6.0
5667              #990099 2018-12-09 06:00:00  ...   Buttigieg      1   NaN
5668              #ff9900 2018-12-09 06:00:00  ...      Harris      1   5.3
5669              #3da882 2018-12-09 06:00:00  ...        Yang      1   NaN
5670              #f2dc0f 2018-12-09 06:00:00  ...   Bloomberg      1   NaN
5671              #000000 2018-12-09 06:00:00  ...   Klobuchar      1   NaN
5672              #66ccff 2018-12-09 06:00:00  ...      Booker      1   4.0
5673              #666666 2018-12-09 06:00:00  ...      Steyer    NaN   NaN
5674              #ff0074 2018-12-09 06:00:00  ...     Gabbard      1   NaN
5675              #cc9900 2018-12-09 06:00:00  ...      Castro      1   NaN
5676              #9966ff 2018-12-09 06:00:00  ...      Bennet      1   NaN
5677              #10671b 2018-12-09 06:00:00  ...     Bullock      3   NaN
5678              #990000 2018-12-09 06:00:00  ...     Patrick    NaN   NaN
5679              #6672ff 2018-12-09 06:00:00  ...      Sestak    NaN   NaN

[5680 rows x 7 columns]

如您所见,在绘制图表时,它看起来像站点上的图形:

# Convert columns to appropriate type to chart
results['value'] = results['value'].astype(float)
results['date'] = pd.to_datetime(results['date']) 

import seaborn as sns
import matplotlib.pyplot as plt

sns.set_style('darkgrid')
palette = pd.Series(results.color.values,index=results.name).to_dict()

sns.lineplot(data=results, x="date", y="value", hue="name", palette=palette)
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)

enter image description here

© www.soinside.com 2019 - 2024. All rights reserved.