我正在尝试打开文件夹中的多个 pdf 文件并将它们另存为 .txt 文件。
我已经尝试过以下线程中的方法
正如上述线程的答案中所建议的,我尝试了以下代码,但失败并出现错误“用户定义的类型未定义”
Sub ONLYConvertPDF()
Dim AcroXApp As Acrobat.AcroApp
Dim AcroXAVDoc As Acrobat.AcroAVDoc
Dim AcroXPDDoc As Acrobat.AcroPDDoc
Dim Filename As String, DFilename As String, jsObj As Object
Filename = "C:\MyPath\MyFile.pdf"
DFilename = "C:\MyPath\MyFile.txt"
Set AcroXApp = CreateObject("AcroExch.App")
AcroXApp.Show
Set AcroXAVDoc = CreateObject("AcroExch.AVDoc")
AcroXAVDoc.Open Filename, "Acrobat"
Set AcroXPDDoc = AcroXAVDoc.GetPDDoc
Set jsObj = AcroXPDDoc.GetJSObject
jsObj.SaveAs DFilename, "com.adobe.acrobat.plain-text"
AcroXAVDoc.Close False
AcroXApp.Hide
AcroXApp.Exit
End Sub
我尝试过涉及 Acrobat.acroApp、Acrobat.AcroAVDoc、Acrobat.AcroPDDoc 的类似其他线程,但在某些地方重复出现相同的错误。
我还尝试了按照超链接方法打开此论坛上的一个线程中建议的pdf文档,但如果您想在操作文件后关闭文件,则该方法似乎不起作用。 (我不知道如何关闭文件)
我添加了以下库
当我尝试添加 a) PDFPrevHndlr 1.0 类型库 & b) PDFShellServer 1.0 类型库(我不知道其中任何一个是否有必要)时,我收到错误“加载 DLL 时出错”
我需要添加什么吗?我安装了 Adobe Acrobat Reader DC
我对处理库、DLL 等不太了解。 有人可以帮忙吗?预先非常感谢您。
Xpdf提供了一些命令行工具,其中一个是pdftotext.exe,用于将pdf中的文本导出到文件。
Private Sub PdfToText(ByVal PdfPath As String, ByVal TextPath As String)
Const PathToPdfToText As String = "" '"path\to\exe" 'add path to exe if not in windows path
With CreateObject("wscript.shell")
.Run Chr(34) & PathToPdfToText & "pdftotext.exe" & Chr(34) & " " & Chr(34) & PdfPath & Chr(34) & " " & Chr(34) & TextPath & Chr(34), 1, 1
End With
End Sub
使用类似
PdfToText "path\to\pdfdoc.pdf", "path\to\textfile.txt"
旧的读者控制建议(可能对某人有用)缺少 SaveAs 方法
在 Office x86 中获取 Adobe Reader AciveX 控件
在提升的 powershell 上执行以下操作:
& "$env:SystemRoot\SysWOW64\regsvr32" "C:\Program Files (x86)\Common Files\Adobe\Acrobat\ActiveX\AcroPDFImpl.dll"
注册控件。
注册后,您就有了一个可以在 x86 上使用的“Adobe PDF Reader Imp”ActiveX 控件。功劳归于Nouba。
但我不确定Reader是否提供另存为文本。
您需要安装 Adobe Acrobat Professional!试试这个代码。
Option Explicit
Option Private Module
Sub SavePDFAsOtherFormat(PDFPath As String, FileExtension As String)
'Saves a PDF file as another format using Adobe Professional.
'By Christos Samaras
'https://myengineeringworld.net/////
'In order to use the macro you must enable the Acrobat library from VBA editor:
'Go to Tools -> References -> Adobe Acrobat xx.0 Type Library, where xx depends
'on your Acrobat Professional version (i.e. 9.0 or 10.0) you have installed to your PC.
'Alternatively you can find it Tools -> References -> Browse and check for the path
'C:Program FilesAdobeAcrobat xx.0Acrobatacrobat.tlb
'where xx is your Acrobat version (i.e. 9.0 or 10.0 etc.).
Dim objAcroApp As Acrobat.AcroApp
Dim objAcroAVDoc As Acrobat.AcroAVDoc
Dim objAcroPDDoc As Acrobat.AcroPDDoc
Dim objJSO As Object
Dim boResult As Boolean
Dim ExportFormat As String
Dim NewFilePath As String
'Check if the file exists.
If Dir(PDFPath) = "" Then
MsgBox "Cannot find the PDF file!" & vbCrLf & "Check the PDF path and retry.", _
vbCritical, "File Path Error"
Exit Sub
End If
'Check if the input file is a PDF file.
If LCase(Right(PDFPath, 3)) <> "pdf" Then
MsgBox "The input file is not a PDF file!", vbCritical, "File Type Error"
Exit Sub
End If
'Initialize Acrobat by creating App object.
Set objAcroApp = CreateObject("AcroExch.App")
'Set AVDoc object.
Set objAcroAVDoc = CreateObject("AcroExch.AVDoc")
'Open the PDF file.
boResult = objAcroAVDoc.Open(PDFPath, "")
'Set the PDDoc object.
Set objAcroPDDoc = objAcroAVDoc.GetPDDoc
'Set the JS Object - Java Script Object.
Set objJSO = objAcroPDDoc.GetJSObject
'Check the type of conversion.
Select Case LCase(FileExtension)
Case "eps": ExportFormat = "com.adobe.acrobat.eps"
Case "html", "htm": ExportFormat = "com.adobe.acrobat.html"
Case "jpeg", "jpg", "jpe": ExportFormat = "com.adobe.acrobat.jpeg"
Case "jpf", "jpx", "jp2", "j2k", "j2c", "jpc": ExportFormat = "com.adobe.acrobat.jp2k"
Case "docx": ExportFormat = "com.adobe.acrobat.docx"
Case "doc": ExportFormat = "com.adobe.acrobat.doc"
Case "png": ExportFormat = "com.adobe.acrobat.png"
Case "ps": ExportFormat = "com.adobe.acrobat.ps"
Case "rft": ExportFormat = "com.adobe.acrobat.rft"
Case "xlsx": ExportFormat = "com.adobe.acrobat.xlsx"
Case "xls": ExportFormat = "com.adobe.acrobat.spreadsheet"
Case "txt": ExportFormat = "com.adobe.acrobat.accesstext"
Case "tiff", "tif": ExportFormat = "com.adobe.acrobat.tiff"
Case "xml": ExportFormat = "com.adobe.acrobat.xml-1-00"
Case Else: ExportFormat = "Wrong Input"
End Select
'Check if the format is correct and there are no errors.
If ExportFormat <> "Wrong Input" And Err.Number = 0 Then
'Format is correct and no errors.
'Set the path of the new file. Note that Adobe instead of xls uses xml files.
'That's why here the xls extension changes to xml.
If LCase(FileExtension) <> "xls" Then
NewFilePath = WorksheetFunction.Substitute(PDFPath, ".pdf", "." & LCase(FileExtension))
Else
NewFilePath = WorksheetFunction.Substitute(PDFPath, ".pdf", ".xml")
End If
'Save PDF file to the new format.
boResult = objJSO.SaveAs(NewFilePath, ExportFormat)
'Close the PDF file without saving the changes.
boResult = objAcroAVDoc.Close(True)
'Close the Acrobat application.
boResult = objAcroApp.Exit
'Inform the user that conversion was successfully.
MsgBox "The PDf file:" & vbNewLine & PDFPath & vbNewLine & vbNewLine & _
"Was saved as: " & vbNewLine & NewFilePath, vbInformation, "Conversion finished successfully"
Else
'Something went wrong, so close the PDF file and the application.
'Close the PDF file without saving the changes.
boResult = objAcroAVDoc.Close(True)
'Close the Acrobat application.
boResult = objAcroApp.Exit
'Inform the user that something went wrong.
MsgBox "Something went wrong!" & vbNewLine & "The conversion of the following PDF file FAILED:" & _
vbNewLine & PDFPath, vbInformation, "Conversion failed"
End If
'Release the objects.
Set objAcroPDDoc = Nothing
Set objAcroAVDoc = Nothing
Set objAcroApp = Nothing
End Sub
请参阅下面的链接了解更多详情。
https://myengineeringworld.net/2013/03/vba-macro-to-convert-pdf-files-into.html
或者,如果您没有安装 Acrobat,请尝试下面的解决方案,当然,还可以更改代码以满足您的需求。
Sub ChangeDocsToTxtOrRTFOrHTML()
'with export to PDF in Word 2007
Dim fs As Object
Dim oFolder As Object
Dim tFolder As Object
Dim oFile As Object
Dim strDocName As String
Dim intPos As Integer
Dim locFolder As String
Dim fileType As String
On Error Resume Next
locFolder = InputBox("Enter the folder path to DOCs", "File Conversion", "C:\Users\your_path_here\")
Select Case Application.Version
Case Is < 12
Do
fileType = UCase(InputBox("Change DOC to TXT, RTF, HTML", "File Conversion", "TXT"))
Loop Until (fileType = "TXT" Or fileType = "RTF" Or fileType = "HTML")
Case Is >= 12
Do
fileType = UCase(InputBox("Change DOC to TXT, RTF, HTML or PDF(2007+ only)", "File Conversion", "TXT"))
Loop Until (fileType = "TXT" Or fileType = "RTF" Or fileType = "HTML" Or fileType = "PDF")
End Select
Application.ScreenUpdating = False
Set fs = CreateObject("Scripting.FileSystemObject")
Set oFolder = fs.GetFolder(locFolder)
Set tFolder = fs.CreateFolder(locFolder & "Converted")
Set tFolder = fs.GetFolder(locFolder & "Converted")
For Each oFile In oFolder.Files
Dim d As Document
Set d = Application.Documents.Open(oFile.Path)
strDocName = ActiveDocument.Name
intPos = InStrRev(strDocName, ".")
strDocName = Left(strDocName, intPos - 1)
ChangeFileOpenDirectory tFolder
Select Case fileType
Case Is = "TXT"
strDocName = strDocName & ".txt"
ActiveDocument.SaveAs FileName:=strDocName, FileFormat:=wdFormatText
Case Is = "RTF"
strDocName = strDocName & ".rtf"
ActiveDocument.SaveAs FileName:=strDocName, FileFormat:=wdFormatRTF
Case Is = "HTML"
strDocName = strDocName & ".html"
ActiveDocument.SaveAs FileName:=strDocName, FileFormat:=wdFormatFilteredHTML
Case Is = "PDF"
strDocName = strDocName & ".pdf"
ActiveDocument.ExportAsFixedFormat OutputFileName:=strDocName, ExportFormat:=wdExportFormatPDF
End Select
d.Close
ChangeFileOpenDirectory oFolder
Next oFile
Application.ScreenUpdating = True
End Sub