在数据帧的各个列上利用 itertools.combinations,但使其始终包含特定列

问题描述 投票:0回答:2

我正在尝试使用 sm.OLS 和 itertools.combinations 运行最佳子集多元回归。我已经添加了常量,但因为 itertools.combinations 循环遍历所有列组合,有时它会排除常量项。

为了解决这个问题,我尝试使用 itertools.combinations 始终在每个其他组合中包含该常量列。

结果仅包括一些组合,包括常数。我怎样才能使每个组合都包含常量列?

我正在寻找的示例:

[('const', 'B', 'C'), ('const', 'B', 'D'), ('const', 'B', 'E'), ('const', ' B', 'F'), ('常量', 'A', 'B'),

这是我目前拥有的示例(带有结果的图片):

cols = ['A', 'B', 'C', 'D', 'E', 'F']
const = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]


ran = np.random.rand(12, 6)
df = pd.DataFrame(data = ran, columns=cols)
df['const'] = const
results=[]
print(df)
for combo in itertools.combinations(df.columns, 3):
    results.append(combo)

print(results)

enter image description here

python linear-regression python-itertools
2个回答
0
投票

为什么不使用不带“const”的组合的列表理解,并为每对添加“const”?

results = [('const', *x) for x in
           itertools.combinations(df.columns.difference(['const']), 2)]

输出:

[('const', 'A', 'B'),
 ('const', 'A', 'C'),
 ('const', 'A', 'D'),
 ('const', 'A', 'E'),
 ('const', 'A', 'F'),
 ('const', 'B', 'C'),
 ('const', 'B', 'D'),
 ('const', 'B', 'E'),
 ('const', 'B', 'F'),
 ('const', 'C', 'D'),
 ('const', 'C', 'E'),
 ('const', 'C', 'F'),
 ('const', 'D', 'E'),
 ('const', 'D', 'F'),
 ('const', 'E', 'F')]

0
投票

IIUC,你可以这样做:

for combo in itertools.combinations(df.columns[:-1], 2): # -1 because we want "const" column exclude
    results.append(["const", *combo])

print(results)

打印:

[
    ["const", "A", "B"],
    ["const", "A", "C"],
    ["const", "A", "D"],
    ["const", "A", "E"],
    ["const", "A", "F"],
    ["const", "B", "C"],
    ["const", "B", "D"],
    ["const", "B", "E"],
    ["const", "B", "F"],
    ["const", "C", "D"],
    ["const", "C", "E"],
    ["const", "C", "F"],
    ["const", "D", "E"],
    ["const", "D", "F"],
    ["const", "E", "F"],
]
© www.soinside.com 2019 - 2024. All rights reserved.