如何将CatBoost模型导出到文本以便将来解析为if-else决策树？

Question

我目前正在使用新的 CatBoost 算法（python 版本）并尝试将我的模型导出到 txt 文件，以将我的模型传输到 C/Java 实现。查看文档，我只找到了save_model方法，该方法只接受两种格式的文件：1.二进制2.Apple的CoreML

这些格式都不适合我，所以也许还有其他方法来实现它？

Answer 1

无法直接执行此操作：Catboost 目前还不支持模型序列化。

但是，Catboost 已经可以将模型转换为 CoreML，并且有一个 CoreML 工具可以将模型序列化为类似 JSON 的文本。享受最小的例子：

from sklearn import datasets
iris = datasets.load_iris()

import catboost
# the shortest possible model specification
cls = catboost.CatBoostClassifier(loss_function='MultiClass', iterations=1, depth=1)
cls.fit(iris.data, iris.target)

# save model to CoreML format
cls.save_model(
    "iris.mlmodel",
    format="coreml", 
    export_parameters={
        'prediction_type': 'probability'
    }
)

# there is a CoreML tool for model serialization
import coremltools
model = coremltools.models.model.MLModel("iris.mlmodel")
model.get_spec()

您可能需要阅读 coremltools 文档才能完全理解此代码打印的内容，但您可以像这样阅读输出：

"There is an ensemble of a single tree with 2 leaves - in the leaf 0, class 0 dominates, in the leaf 1 - classes 1 and 2. Go to the leaf 1, if feature 3 is larger than 0.8, otherwise go to leaf 0"

specificationVersion: 1
description {
  input {
    name: "feature_3"
    type {
      doubleType {
      }
    }
  }
  output {
    name: "prediction"
    type {
      multiArrayType {
        shape: 3
        dataType: DOUBLE
      }
    }
  }
  predictedFeatureName: "prediction"
  predictedProbabilitiesName: "prediction"
  metadata {
    shortDescription: "Catboost model"
    versionString: "1.0.0"
    author: "Mr. Catboost Dumper"
  }
}
treeEnsembleRegressor {
  treeEnsemble {
    nodes {
      nodeBehavior: LeafNode
      evaluationInfo {
        evaluationValue: 0.05084745649058943
      }
      evaluationInfo {
        evaluationIndex: 1
        evaluationValue: -0.025423728245294732
      }
      evaluationInfo {
        evaluationIndex: 2
        evaluationValue: -0.025423728245294732
      }
    }
    nodes {
      nodeId: 1
      nodeBehavior: LeafNode
      evaluationInfo {
        evaluationValue: -0.02752293516463098
      }
      evaluationInfo {
        evaluationIndex: 1
        evaluationValue: 0.01376146758231549
      }
      evaluationInfo {
        evaluationIndex: 2
        evaluationValue: 0.013761467582315471
      }
    }
    nodes {
      nodeId: 2
      nodeBehavior: BranchOnValueGreaterThan
      branchFeatureIndex: 3
      branchFeatureValue: 0.800000011920929
      trueChildNodeId: 1
    }
    numPredictionDimensions: 3
    basePredictionValue: 0.0
    basePredictionValue: 0.0
    basePredictionValue: 0.0
  }
  postEvaluationTransform: Classification_SoftMax
}

这种方法有一个缺点：CoreML 不支持 Catboost 处理分类特征的方式。因此，如果您想序列化具有分类特征的模型，则需要在训练之前对它们进行 one-hot 编码。

Answer 2

如果切换到使用命令行程序，则可以使用

--print-trees

选项。但它仅显示正在训练的模型的树。所以你无法获取现有模型的树。

Answer 3

我将其留在这里，以免互联网上的人像我那样走弯路。据我了解，上面的答案有点过时了。在当前文档（2024）中，可以将模型直接保存为 python 和 c plus plus 格式。在这些格式中，它们不使用任何第三方库，同时实现了模型的所有功能。确实，存在一些限制，但就我而言，它们并没有干扰。反过来，这些语言可以翻译成其他语言（我现在将 CCP 翻译成 java 脚本）。我希望我帮助了某人。祝你好运！

如何将CatBoost模型导出到文本以便将来解析为if-else决策树？

问题描述投票：0回答：3

3个回答

最新问题

如何将CatBoost模型导出到文本以便将来解析为if-else决策树？

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3