Noob在这里。请继续宽恕我正在学习的格式。我正在尝试创建一个由三列组成的时间序列(我认为是一个数据帧)。一个是日期列,下一个是库存列,最后一个是价格列。
我提取了两个单独的系列(日期和库存;日期和价格),我希望将这两个系列融合在一起,以便可以看到三列而不是两组中的两个。这是我的代码。
导入json将numpy导入为np将熊猫作为pd导入从urllib.error导入URLError,HTTPError从urllib.request导入urlopen
EIAgov类(对象):def init(自身,令牌,系列):'''目的:通过请求以下内容来初始化EIAgov类:-EIA代币-要下载的系列的ID码
Parameters:
- token: string
- series: string or list of strings
'''
self.token = token
self.series = series
'''
def __repr__(self):
return str(self.series)
'''
def Raw(self, ser):
# Construct url
url = 'http://api.eia.gov/series/?api_key=' + self.token + '&series_id=' + ser.upper()
try:
# URL request, URL opener, read content
response = urlopen(url);
raw_byte = response.read()
raw_string = str(raw_byte, 'utf-8-sig')
jso = json.loads(raw_string)
return jso
except HTTPError as e:
print('HTTP error type.')
print('Error code: ', e.code)
except URLError as e:
print('URL type error.')
print('Reason: ', e.reason)
def GetData(self):
# Deal with the date series
date_ = self.Raw(self.series[0])
date_series = date_['series'][0]['data']
endi = len(date_series) # or len(date_['series'][0]['data'])
date = []
for i in range (endi):
date.append(date_series[i][0])
# Create dataframe
df = pd.DataFrame(data=date)
df.columns = ['Date']
# Deal with data
lenj = len(self.series)
for j in range (lenj):
data_ = self.Raw(self.series[j])
data_series = data_['series'][0]['data']
data = []
endk = len(date_series)
for k in range (endk):
data.append(data_series[k][1])
df[self.series[j]] = data
return df
如果name =='main':tok ='mytoken'
# Natural Gas - Weekly Storage
#
ngstor = ['NG.NW2_EPG0_SWO_R48_BCF.W'] # w/ several series at a time ['ELEC.REV.AL-ALL.M', 'ELEC.REV.AK-ALL.M', 'ELEC.REV.CA-ALL.M']
stordata = EIAgov(tok, ngstor)
print(stordata.GetData())
# Natural Gas - Weekly Prices
#
ngpx = ['NG.RNGC1.W'] # w/ several series at a time ['ELEC.REV.AL-ALL.M', 'ELEC.REV.AK-ALL.M', 'ELEC.REV.CA-ALL.M']
pxdata = EIAgov(tok, ngpx)
print(pxdata.GetData())
请注意,'mytoken'需要替换为eia.gov API密钥。我可以获取它以成功创建两个列表的输出...但是为了合并列表,我尝试在末尾添加此列表:
joined_frame = pd.concat([ngstor,ngpx],axis = 1,sort = False)
print(joined_frame.GetData())
但是我收到一个错误(“ TypeError:无法连接类型为”的对象;只有Series和DataFrame objs有效”),因为显然我不知道列表和系列之间的区别。
如何按日期合并这些列表?非常感谢您的帮助。 (也可以随时提出意见,以防止我在本篇文章中正确地格式化代码很糟糕。)
[如果要在其余代码中将它们作为DataFrame进行操作,则可以按以下方式将ngstor
和ngpx
转换为DataFrame:
# I create two lists that look like yours
ngstor = [[1,2], ["2020-04-03", "2020-05-07"]]
ngpx = [[3,4] , ["2020-04-03", "2020-05-07"]]
# I transform them to DataFrames
ngstor = pd.DataFrame({"value1": ngstor[0],
"date_col": ngstor[1]})
ngpx = pd.DataFrame({"value2": ngpx[0],
"date_col": ngpx[1]})
然后您可以使用pandas.merge
或pandas.concat
:
# merge option
joined_framed = pd.merge(ngstor, ngpx, on="date_col",
how="outer")
# concat option
ngstor = ngstor.set_index("date_col")
ngpx = ngpx.set_index("date_col")
joined_framed = pd.concat([ngstor, ngpx], axis=1,
join="outer").reset_index()
结果将是:
date_col value1 value2
0 2020-04-03 1 3
1 2020-05-07 2 4