我有一个带有N个键列的CSV文件,以及一个包含表达式的列,这些表达式包含对我希望用该行的每个键列中的值替换的键列的1到N的引用。希望下面的例子澄清我的意思。
下面的关键列是A,B,C
期望的输出:
20_A
20_B
30_A
30_B
40_C_4
40_C_5
我的解决方案
keys = ['Age','Type','Delay']
df = pd.read_csv(csv_path)
for index, row in df.iterrows():
key1_list = row[keys[0]].split(",")
key2_list = row[keys[1]].split(",")
key3_list = row[keys[2]].split(",")
expression = row['Expression']
# Iterate over all combinations of key column values and export a chart for each one
for KEY1 in key1_list:
for KEY2 in key2_list:
for KEY3 in key3_list:
string = expression
string = string.replace("<" + keys[0] + ">", KEY1)
string = string.replace("<" + keys[1] + ">", KEY2)
string = string.replace("<" + keys[2] + ">", KEY3)
print(string)
但是,我想将我的代码概括为适用于任意数量的键列,并且只需要在开头更新键列表。这将需要循环到深度len(键)。但我无法弄清楚如何使用扁平代码将循环推广到任何深度,我查看了itertools但找不到我需要的东西。我认为递归可能有效,但我更愿意避免这种情况。
递归当然可以解决你的问题,但你应该在沿着那条路走下去itertools
。你想要的是你的钥匙产品,以生成所有可能的钥匙组合。
实现此目的的一种方法如下:
import pandas as pd
import itertools
csv_path = "path/to/file"
df = pd.read_csv(csv_path)
# Find available keys from data frame instead of manually input it:
keys = list(df.keys()[:-1]) # Do not include "Expression" as it is not a key.
for index, row in df.iterrows():
# Add list of keys to a list of lists
# (The order needs to be preserved, therefore avoiding dict)
key_list = []
for key in keys:
# The code uses ',' as value separator in each cell.
# Does this work in a csv file?
key_list.append(list(row[key].split(',')))
expression = row['Expression']
# All key combinations are then generated with 'itertools.product'
combos = itertools.product(*key_list)
# Each combo is then handled separately
for combo in combos:
string = expression
# Replace each key in order
# Must be done sequentially since depth is not known/variable
for key, value in zip(keys, combo):
string = string.replace('<' + key + '>', value)
print(string)
希望这段代码是可以理解的,并做你想要的。否则请告诉我,我会进一步澄清。