将 HTML 表中的数据获取到 Access 数据库中

问题描述 投票:0回答:4

如何从 HTML 表(例如,从市场数据 S&P 500)动态填充数据库?

我有一个 Yahoo! 帐户财务。在帐户中我可以查看 HTML 格式的财务数据。

我需要一个简单的工具来从 HTML 表填充数据库 (Access)。哪里可以找到这样的工具?

html sql ms-access html-parsing
4个回答
1
投票

您可以从 Yahoo 历史数据导出为 CSV,并直接在 Access 中将该 csv 文件链接为 MS Access 表。 http://office.microsoft.com/en-ca/access-help/import-or-link-to-data-in-a-text-file-HA001232227.aspx

如果您想处理 html 页面源代码,那么此链接可能会有所帮助。

http://www.access-programmers.co.uk/forums/showthread.php?p=1145646


0
投票

ACE/Jet OLEDB 可用于直接从 HTML 文件导入数据。例如,给定一个现有访问表[DataFromHtml]

ID  LastName
--  --------

和一个包含表格的 HTML 文件

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
    <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
    <title>
        Test Data
    </title>
</head>
<body>
<table>
    <tr>
        <th>
            ID
        </th>
        <th>
            LastName
        </th>
    </tr>
    <tr>
        <td>
            1
        </td>
        <td>
            Thompson
        </td>
    </tr>
    <tr>
        <td>
            2
        </td>
        <td>
            O'Rourke
        </td>
    </tr>
</table>
</body>
</html>

以下 VBA 代码将清除 Access 表 (

DELETE FROM
),然后将 HTML 表数据导入其中。

Sub ImportFromHtml()
Const LocalTableName = "DataFromHtml"
Dim con As Object, rstHtml As Object, fld As Object, _
        cdb As DAO.Database, rstAccdb As DAO.Recordset, _
        recCount As Long

Set con = CreateObject("ADODB.Connection")
con.Open _
        "Provider=Microsoft.ACE.OLEDB.12.0;" & _
        "Data Source=C:\Users\Gord\Documents\table.htm;" & _
        "Extended Properties=""HTML Import;HDR=YES;IMEX=1"";"
Set rstHtml = CreateObject("ADODB.Recordset")
rstHtml.Open "SELECT * FROM [Test Data]", con

Set cdb = CurrentDb
cdb.Execute "DELETE FROM [" & LocalTableName & "]", dbFailOnError
Set rstAccdb = cdb.OpenRecordset(LocalTableName, dbOpenTable)

recCount = 0
Do While Not rstHtml.EOF
    recCount = recCount + 1
    rstAccdb.AddNew
    For Each fld In rstHtml.Fields
        rstAccdb.Fields(Trim(fld.Name)).Value = Trim(fld.Value)
    Next
    Set fld = Nothing
    rstAccdb.Update
    rstHtml.MoveNext
Loop

rstAccdb.Close
Set rstAccdb = Nothing
Set cdb = Nothing

rstHtml.Close
Set rstHtml = Nothing
con.Close
Set con = Nothing

Debug.Print recCount & " record(s) imported"
End Sub

0
投票

假设 Gord Thompson 解决方案的 HTML 结构,有一种使用 ADO 的非常快速的方法。

Public Function GetTitle(ByVal HtmlFile As String) As String
    Dim DOM As Object

    Set DOM = CreateObject("MSXML2.DOMDocument")
    DOM.Load HtmlFile
    GetTitle = DOM.getElementsByTagName("title")(0).Text
End Function

Public Sub Import(ByVal Filename As String, ByVal Tablename As String)
    Dim SQL As String
    Dim Title As String
    On Error GoTo Import_Error

    Title = GetTitle(Filename)

    CurrentProject.Connection.Execute "DROP TABLE " & Tablename

    SQL = "SELECT * INTO " & Tablename & _
          " FROM [HTML Import;HDR=YES;IMEX=1;DATABASE=" & Filename & "].[" & Title & "]"
    CurrentProject.Connection.Execute SQL

    Exit Sub
Import_Error:
End Sub

因此,您想将 HTML 文件“C:\SomeFolder\MyFile.html”获取到表“MyImport”中,请使用:

Import "C:\SomeFolder\MyFile.html", "MyImport"

一个附加提示:如果 HTML 文件的标题包含特殊字符,例如 .或:,导入将失败。你必须尝试一下哪些特殊字符有问题,哪些没有。


0
投票

我知道这是一个老问题,但希望我的解决方案能帮助别人。 我最近收到大量单独的 html 文件中的表格。这些文件是从 Oracle/Unix 系统导出的;具有 csv 导出功能的版本之前的版本。 我在 Win10 和 SSD 上有 MS-Access 365。
我首先尝试了上面的解决方案,但这似乎与手动导入每个 html 一样慢。
所以我尝试了一种不同的方法。我添加了对“Microsoft HTML 对象库”的引用。
唯一的警告是:它不是动态的:我没有添加代码来映射每一列,但可以动态创建一个表。

' import a table from a local html file into a similar Access table
Function Import(FileName As String, TableName As String) As Long ' returns number of rows
Dim FSO As New FileSystemObject ' faster and proper read of \n
Dim ts As TextStream
Dim fld As Long, rows As Long, bt As Long, tm As Single
Dim rs As DAO.Recordset
Dim doc As New MSHTML.HTMLDocument
Dim trTag As Object
Dim tdTag As Object
On Error GoTo errH
    Debug.Print "Loading " & FileName
If Dir(FileName) > " " Then
    tm = Timer
    bt = FileLen(FileName)
    Set ts = FSO.OpenTextFile(FileName, ForReading)
    doc.body.innerHTML = ts.ReadAll
    Do Until doc.ReadyState = "complete"  ' 4
        DoEvents
    Loop
    ts.Close
    Debug.Print "Loaded " & bt & " bytes in " & (Timer - tm) & " seconds. "
Else
    MsgBox "File not exist"
    Exit Function
End If
CurrentDb.Execute "DELETE FROM [" & TableName & "]", dbFailOnError
Set rs = CurrentDb.OpenRecordset(TableName, dbOpenTable)
    Debug.Print "Destination has " & rs.Fields.Count & " fields (starting at 0)."
For Each trTag In doc.getElementsByTagName("tr")
    If rows > 0 Then
        rs.AddNew ' first row contains header. Some column names MSaccess don't like
        fld = 0
    End If
    For Each tdTag In trTag.childNodes()
        fld = fld + 1 'field counter
        If rows = 0 Then
            Debug.Print tdTag.innerText; "|"; ' field names from source
        Else
            ' add field
            rs.Fields(fld) = Trim(tdTag.innerText) ' When null use Resume Next - Nz not working here
        End If
    Next
    'add row
    If rows > 0 Then
        rs.Update
    Else
        Debug.Print
        Debug.Print "Source has " & fld & " fields."
        ' you could create a table here on the fly
        ' with field(0) being an autonumber key field, all others dbtext(255)
        ' then you re-open the Recordset
    End If
    rows = rows + 1: fld = 0
    If rows Mod 1000 = 0 Then
        DoEvents ' ability to interupt
    End If
Next
Import = rows
rs.Close
Set rs = Nothing
Debug.Print "File imported in " & (Timer - tm) & " seconds. "
Exit Function
errH:
Resume Next
End Function

我用于此测试的数据文件大小为 31MB,包含 46K+ 行和 22 个字段。我还有超过 200MB 的其他文件。该函数的加载时间为 52 秒,而使用 OLEDB 版本则需要 170 秒。这告诉我,MSHTML.HTMLDocument dll 解释 html 文件的速度比 Microsoft.ACE.OLEDB.12.0 更快。

© www.soinside.com 2019 - 2024. All rights reserved.