如何比较两个str值dataframe python pandas

问题描述 投票:-1回答:1

我正在尝试比较数据帧中的两个不同值。我发现我无法利用的问题/答案。

import pandas as pd
# from datetime import timedelta

"""
read csv file
clean date column
convert date str to datetime
sort for equity options
replace date str column with datetime column
"""
trade_reader = pd.read_csv('TastyTrades.csv')
trade_reader['Date'] = trade_reader['Date'].replace({'T': ' ', '-0500': ''}, regex=True)
date_converter = pd.to_datetime(trade_reader['Date'], format="%Y-%m-%d %H:%M:%S")
options_frame = trade_reader.loc[(trade_reader['Instrument Type'] == 'Equity Option')]
clean_frame = options_frame.replace(to_replace=['Date'], value='date_converter')

# Separate opening transaction from closing transactions, combine frames
opens = clean_frame[clean_frame['Action'].isin(['BUY_TO_OPEN', 'SELL_TO_OPEN'])]
closes = clean_frame[clean_frame['Action'].isin(['BUY_TO_CLOSE', 'SELL_TO_CLOSE'])]
open_close_set = set(opens['Symbol']) & set(closes['Symbol'])
open_close_frame = clean_frame[clean_frame['Symbol'].isin(open_close_set)]

'''
convert Value to float
sort for trade readability
write
'''
ocf_float = open_close_frame['Value'].astype(float)
ocf_sorted = open_close_frame.sort_values(by=['Date', 'Call or Put'], ascending=True)
# for readability, revert back to ocf_sorted below
ocf_list = ocf_sorted.drop(
    ['Type', 'Instrument Type', 'Description', 'Quantity', 'Average Price', 'Commissions', 'Fees', 'Multiplier'], axis=1
    )
ocf_list.reset_index(drop=True, inplace=True)
ocf_list['Strategy'] = ''
# ocf_list.to_csv('Sorted.csv')

# create strategy list
debit_single = []
debit_vertical = []
debit_calendar = []
credit_vertical = []
iron_condor = []

# shift columns
ocf_list['Symbol Shift'] = ocf_list['Underlying Symbol'].shift(1)
ocf_list['Symbol Check'] = ocf_list['Underlying Symbol'] == ocf_list['Symbol Shift']

# compare symbols, append depending on criteria met
for row in ocf_list:
    if row['Symbol Shift'] is row['Underlying Symbol']:
        debit_vertical.append(row)

print(type(ocf_list['Underlying Symbol']))
ocf_list.to_csv('Sorted.csv')
print(debit_vertical)
# delta = timedelta(seconds=10)

我得到的错误是:

line 51, in <module>
    if row['Symbol Check'][-1] is row['Underlying Symbol'][-1]:
TypeError: string indices must be integers

我正在尝试将新创建的移位列与原始移位列进行比较,如果相同,则追加到列表中。有没有办法在python中完全比较两个字符串值?我尝试检查Symbol Check是否为真,并且它仍然返回有关str索引必须为int的错误。 .iterrows()无效

python pandas dataframe shift
1个回答
0
投票

这里,您实际上将遍历DataFrame的列,而不是行:

for row in ocf_list:
    if row['Symbol Shift'] is row['Underlying Symbol']:
        debit_vertical.append(row)

您可以使用方法iterrowsitertuples之一遍历行,但是它们分别将行作为列表和元组返回,这意味着您不能像在此那样使用列名对它们进行索引。

第二,您应该使用==而不是is,因为您可能是在比较值,而不是标识。

最后,由于熊猫是根据条件选择行的,所以我将完全跳过对行的迭代。您应该可以使用以下代码替换上述代码:

debit_vertical = ocf_list[ocf_list['Symbol Shift'] == ocf_list['Underlying Symbol']].values.tolist()
© www.soinside.com 2019 - 2024. All rights reserved.