尝试使用 h2o 训练某些数据集时出现 java.lang.AssertionError

问题描述 投票:0回答:1

在尝试使用隔离林方法检测异常时,我收到所需数据集的错误。但是我有另一个完全不同的数据集,它工作得很好,什么可能导致这个问题?

isolationforest Model Build progress: | (failed) | 0% Traceback (most recent call last): File 
"h2o_test.py", line 149, in <module> isoforest.train(x=iso_forest.col_names[0:65], 
training_frame=iso_forest) File "/home/ec2-user/.local/lib/python3.7/site- 
packages/h2o/estimators/estimator_base.py", line 107, in train self._train(parms, 
verbose=verbose) File "/home/ec2-user/.local/lib/python3.7/site- 
packages/h2o/estimators/estimator_base.py", line 199, in _train 
job.poll(poll_updates=self._print_model_scoring_history if verbose else None) File 
"/home/ec2-user/.local/lib/python3.7/site-packages/h2o/job.py", line 89, in poll 
"\n{}".format(self.job_key, self.exception, self.job["stacktrace"])) OSError: Job with key 
$03017f00000132d4ffffffff$_92ee3e892f7bc86460e80153eaec4b70 failed with an exception: 

java.lang.AssertionError stacktrace: java.lang.AssertionError at 
hex.tree.DHistogram.init(DHistogram.java:350) at 
hex.tree.DHistogram.init(DHistogram.java:343) at 
hex.tree.ScoreBuildHistogram2$ComputeHistoThread.computeChunk(ScoreBuildHistogram2.java:427) 
at hex.tree.ScoreBuildHistogram2$ComputeHistoThread.map(ScoreBuildHistogram2.java:408) at 
water.LocalMR.compute2(LocalMR.java:89) at water.LocalMR.compute2(LocalMR.java:81) at 
water.H2O$H2OCountedCompleter.compute(H2O.java:1704) at 
jsr166y.CountedCompleter.exec(CountedCompleter.java:468) at 
jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263) at 
jsr166y.ForkJoinPool$WorkQueue.popAndExecAll(ForkJoinPool.java:906) at 
jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:979) at 
jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1479) at 
jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
with open('/home/webapp/flask-api/tmp_rows/temp_file2.csv', 'w+') as tmp_file:
        temp_name = "/tmp_rows/temp_file2.csv"
        tmp_file.write(text_stream.getvalue())
        tmp_file.close()

h2o.init()
print("TEMP_nAME", temp_name)
iso_forest = h2o.import_file('/home/webapp/flask-api/{0}'.format(temp_name))
seed = 12345
ntrees = 100
isoforest = h2o.estimators.H2OIsolationForestEstimator(
ntrees=ntrees, seed=seed)
isoforest.train(x=iso_forest.col_names[0:65], training_frame=iso_forest)
predictions = isoforest.predict(iso_forest)
print(predictions)
h2o.cluster().shutdown()

CSV 创建正常,所以似乎没有问题,是什么导致了这个 Java 错误?我什至增加了 ec2 的大小以获得更多 RAM,但这也没有解决问题。

python java h2o
1个回答
0
投票

我猜这会得到接近的投票,因为这将是导致问题的数据,但没有给出数据。但也许你的数据无法给出,或者数据太多。

因此,我建议尝试仅使用数据的前半部分/后半部分,如果只有一个或另一个触发它,则继续重复,看看是否可以将其缩小到只有一行。

对于列也是如此,例如一次尝试 10-15 列,看看是否只是一列,或者可能是某些类型的列,触发了它。

当然,一旦有了这个,你也就有了解决方案:排除麻烦的列/行。 但您也有足够的时间向 H2O 提交错误报告(看起来可以在 https://github.com/h2oai/h2o-3/issues

© www.soinside.com 2019 - 2024. All rights reserved.