具有2个标准的Python Dataframe Vlookup

问题描述 投票:1回答:2

我有一个具有多个日期/时间/价格的数据框,但是喜欢在每天1600提取价格以创建一个新列(Priceat1600)。 (因此它需要在1600点查看日期和时间)

原始数据帧

    Date  Time     Price
20090130   955  25641.00
20090130   956  25666.60
20090130   959  25746.10
20090130  1000  25794.80
20090130  1006  26023.10
20090130  1600  26000.00
.
.
.
20160902  1600     35.00
20160902  1903     34.84
20160902  1908     34.85
20160902  1912     34.85
20160902  1914     34.85
20160902  1915     34.83

我正在寻找的输出

    Date  Time     Price  Priceat1600
20090130   955  25641.00        26000
20090130   956  25666.60        26000
20090130   959  25746.10        26000
20090130  1000  25794.80        26000
20090130  1006  26023.10        26000
20090130  1600  26000.00        26000
.
.
.
20160902  1600     35.00       35.00
20160902  1903     34.84       35.00
20160902  1908     34.85       35.00
20160902  1912     34.85       35.00
20160902  1914     34.85       35.00
20160902  1915     34.83       35.00
python dataframe
2个回答
1
投票

鉴于您的数据,mask + groupby + transform + first / min / max效果很好:

df.Price.mask(~df.Time.eq(1600)).groupby(df.Date).transform('first')

0     26000.0
1     26000.0
2     26000.0
3     26000.0
4     26000.0
5     26000.0
6        35.0
7        35.0
8        35.0
9        35.0
10       35.0
11       35.0
Name: Price, dtype: float64
  1. 掩盖未在下午4点记录的Price的所有值
  2. Date进行分组,并使用transform在每组的所有行中复制这些值

您可以将结果分配回df['Priceat1600']


0
投票

过滤和合并怎么样?

import pandas as pd
from io import StringIO

data = StringIO('''Date  Time     Price
20090130   955  25641.00
20090130   956  25666.60
20090130   959  25746.10
20090130  1000  25794.80
20090130  1006  26023.10
20090130  1600  26000.00
20160902  1600     35.00
20160902  1903     34.84
20160902  1908     34.85
20160902  1912     34.85
20160902  1914     34.85
20160902  1915     34.83''')


df = pd.read_csv(data, sep='\s+')

price_at_16 = df[df['Time'] == 1600][['Date', 'Price']]

df = df.merge(price_at_16, on='Date', how='left', suffixes=('', 'At1600'))
© www.soinside.com 2019 - 2024. All rights reserved.