我正在尝试使用Azure认知服务的C#FormRecognizer SDK。我有存储在Azure Blob中的pdf,我需要使用C#SDK从这些pdf文件中提取文本/表格。
[我看到“ AnalyzeWithCustomModelAsync”方法将“ Stream”作为输入参数,在该方法中仅接受“ FileStream”类型。如果我将“ MemoryStream”作为输入参数类型传递,则出现以下错误:
{“ value”:{“ error”:{“ code”:“ UnsupportedMediaType”,“ message”:“ 对于HTML表单数据,多部分请求必须包含带有媒体类型-'application / pdf','image / jpeg'或'image / png'。“}},” formatters“:[],” contentTypes“:[],” statusCode“:415}
无论如何,我可以直接使用我的Blob文件,而无需将这些文件保存在本地吗?
关于,马杜
以下代码片段的工作方式是获取blob的实例(进入CloudBlockBlob类),然后将其加载到MemoryStream中。一旦有了它,就可以将其传递给Form Recognizer进行分析。
List<string> blobsToAnalyze = new List<string>();
// Get latest Form Recognizer training model ID
Guid aiTrainModelId = Guid.Empty;
ModelResult latestModel = await FormRecognizer.GetModelAsync(config, log);
if (latestModel != null)
aiTrainModelId = latestModel.ModelId;
// Iterate through all blobs
foreach (string strBlob in blobsToAnalyze)
{
CloudBlockBlob blob = blobContainer.GetBlockBlobReference(strBlob);
using (MemoryStream ms = new MemoryStream())
{
// Load blob into a MemoryStream object
await blob.DownloadToStreamAsync(ms);
// Send to Form Recognizer to analyze
AnalyzeResult results = await FormRecognizer.AnalyzeFormAsync(config, aiTrainModelId, ms, log);
searchResults = FormRecognizer.AnalyzeResults(config, tableClient, results, log);
}
}