用于 ML 预测的 NER（命名实体识别）的 CUDA 问题

Question

我正在尝试使用 NamedEntityRecognition (NER)(https://github.com/dotnet/machinelearning/issues/630) 来预测大量文本中单词/短语的类别。

目前正在使用 3 个 Nuget 包来尝试实现此功能：

Microsoft.ML（3.0.0-预览版.23511.1）

Microsoft.ML.TorchSharp（0.21.0-预览版.23511.1）

Torchsharp-cpu (0.101.1)

在训练模型 [estimator.Fit(dataView)] 时，出现以下错误：

未找到字段：“TorchSharp.torch.CUDA”。

我可能在这里误解了一些东西，但我应该使用 Torchsharp-cpu 包中的 CPU 进行处理，并且我不确定 CUDA 参考来自哪里。这似乎也是一个包引用而不是一个字段？

using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.ML.TorchSharp;
using System;
using System.Collections.Generic;
using System.Windows.Forms;

namespace NerTester
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }

    private class TestSingleSentenceData
    {
        public string Sentence;
        public string[] Label;
    }

    private class Label
    {
        public string Key { get; set; }
    }

    private void startButton_Click(object sender, EventArgs e)
        {
        try
        {
                var context = new MLContext();
                context.FallbackToCpu = true;
                context.GpuDeviceId = null;

                var labels = context.Data.LoadFromEnumerable(
                new[] {
                new Label { Key = "PERSON" },
                new Label { Key = "CITY" },
                new Label { Key = "COUNTRY"  }
                });

                var dataView = context.Data.LoadFromEnumerable(
                    new List<TestSingleSentenceData>(new TestSingleSentenceData[] {
                    new TestSingleSentenceData()
                    {   // Testing longer than 512 words.
                        Sentence = "Alice and Bob live in the USA",
                        Label = new string[]{"PERSON", "0", "PERSON", "0", "0", "0", "COUNTRY"}
                    },
                     new TestSingleSentenceData()
                     {
                        Sentence = "Alice and Bob live in the USA",
                        Label = new string[]{"PERSON", "0", "PERSON", "0", "0", "0", "COUNTRY"}
                     },
                    }));
                var chain = new EstimatorChain<ITransformer>();
                var estimator = chain.Append(context.Transforms.Conversion.MapValueToKey("Label", keyData: labels))
                   .Append(context.MulticlassClassification.Trainers.NameEntityRecognition(outputColumnName: "outputColumn"))
                   .Append(context.Transforms.Conversion.MapKeyToValue("outputColumn"));

                var transformer = estimator.Fit(dataView);
                transformer.Dispose();
                
                MessageBox.Show("Success!");
            }
        catch (Exception ex)
            {
        MessageBox.Show($"Error: {ex.Message}");
            }
    }
    }
}

应用程序在 x64 上运行，NER 的文档似乎有限。

任何帮助将不胜感激。

尝试更改我引用的 Nuget 软件包，包括使用 if libtorch 软件包。

尝试在 x86 和 x64 配置中运行应用程序。

添加了代码以尝试强制使用 CPU 而不是 GPU (CUDA)。

Answer 1

您只需要为该实验引用 2 个包

<ItemGroup>
   <PackageReference Include="Microsoft.ML.TorchSharp" Version="0.21.0-preview.23511.1" />
   <PackageReference Include="libtorch-cpu-<your-platform>" Version="2.1.0.1" />
</ItemGroup>

因为

Microsoft.ML.TorchSharp

包含您需要的所有参考资料：

现在是坏消息。
在运行时，您会收到一堆与丢失文件或 dll 相关的错误。我花了很多时间试图弄清楚我错过了什么，但我想，这只是与某些库的版本有关。

最后，我克隆了整个 repo 并为我的平台（Win-x64）进行编译，并尝试查找不同大小的文件（有些没有版本，所以我大小是唯一的选项），它沸腾了减少到 7 个库：

编译带来的都在那里……只是不是 ML.NET 期望的：

我用 ML.NET 存储库中的 dll 替换了这些 dll，并将它们复制到文件夹中

\bin\Debug\net7.0\runtimes\win-x64\native

，一切正常：

也许有更聪明的解决方案，但我找不到。

用于 ML 预测的 NER（命名实体识别）的 CUDA 问题

问题描述投票：0回答：1

1个回答

最新问题

用于 ML 预测的 NER（命名实体识别）的 CUDA 问题

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1