解释pySpark中的随机森林

Question

晚上好，

我正在尝试找到一种解释Spark中随机森林的方法。通过解释，我的意思是找出specific行中哪些变量最具影响力。

使用python，我曾经这样做：

from treeinterpreter import treeinterpreter as ti
prediction, bias, contributions = ti.predict(rfc, X)

econtributions数组具有我需要的所有信息，然后我可以操纵它以获得所需的结果。有没有办法用python中的spark来做到这一点？

Answer 1

我想您在谈论功能的重要性。当您使用Pipeline对象时，在pyspark中实现：

tree = model.stages[-1]
# load feature importance from the model object
print(tree.featureImportances)

# You can also print the trees with nodes: 
print('Trees with Nodes: {}'.format(tree.toDebugString))

解释pySpark中的随机森林

问题描述投票：1回答：1

1个回答

最新问题

解释pySpark中的随机森林

问题描述 投票：1回答：1

1个回答

最新问题

问题描述投票：1回答：1