如何从SQL Server列中的html获取数据

问题描述 投票:0回答:4

我在SQL Server专栏中有一些html内容,我想从html中读取内容。

例如:

<ektdesignns_choices ektdesignns_nodetype="element" title="How many gigs do you play each month?" ektdesignns_caption="How many gigs do you play each month?" name="ektpoll1303074024421" ektdesignns_name="ektpoll1303074024421" id="ektpoll1303074024421">
  <ol contenteditable="false" onkeypress="design_validate_choice(1, -1, this, 'Options are required.')" onclick="design_validate_choice(1, -1, this, 'Options are required.')" onblur="design_validate_choice(1, -1, this, 'Options are required.')" ektdesignns_validation="choice-req" ektdesignns_maxoccurs="1" ektdesignns_minoccurs="1" unselectable="on" title="How many gigs do you play each month?" class="design_list_vertical">
    <li>
      <input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="1 or fewer_1" title="1 or fewer" id="ID2504263" />
      <label contenteditable="true" unselectable="off" for="ID2504263">1 or fewer</label>
    </li>
    <li>
       <input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="2-4_2" title="2-4" id="ID5115606" />
       <label contenteditable="true" unselectable="off" for="ID5115606">2-4</label>
    </li>
    <li>
        <input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="5-7_3" title="5-7" id="ID477116" />
        <label contenteditable="true" unselectable="off" for="ID477116">5-7</label>
    </li>
    <li>
        <input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="8 or more_4" title="8 or more" id="ID5515606" />
        <label contenteditable="true" unselectable="off" for="ID5515606">8 or more</label>
    </li>
  </ol>
</ektdesignns_choices><input type="submit" value="Vote" />

我想阅读这个html中的所有标签。任何人都有任何想法,我该怎么办呢?

html sql-server-2008 filtering
4个回答
1
投票

如果您的HTML确实符合XHTML,并且您的SQL存储在SQL Server表的XML列中,那么您可以使用XQuery在T-SQL中从中检索您的标签:

DECLARE @HtmlTbl TABLE (ID INT IDENTITY, Html XML)

INSERT INTO @HtmlTbl(Html) VALUES('<ektdesignns_choices ektdesignns_nodetype="element" title="How many gigs do you play each month?" ektdesignns_caption="How many gigs do you play each month?" name="ektpoll1303074024421" ektdesignns_name="ektpoll1303074024421" id="ektpoll1303074024421">
  <ol contenteditable="false" onkeypress="design_validate_choice(1, -1, this, ''Options are required.'')" onclick="design_validate_choice(1, -1, this, ''Options are required.'')" onblur="design_validate_choice(1, -1, this, ''Options are required.'')" ektdesignns_validation="choice-req" ektdesignns_maxoccurs="1" ektdesignns_minoccurs="1" unselectable="on" title="How many gigs do you play each month?" class="design_list_vertical">
    <li>
      <input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="1 or fewer_1" title="1 or fewer" id="ID2504263" />
      <label contenteditable="true" unselectable="off" for="ID2504263">1 or fewer</label>
    </li>
    <li>
       <input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="2-4_2" title="2-4" id="ID5115606" />
       <label contenteditable="true" unselectable="off" for="ID5115606">2-4</label>
    </li>
    <li>
        <input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="5-7_3" title="5-7" id="ID477116" />
        <label contenteditable="true" unselectable="off" for="ID477116">5-7</label>
    </li>
    <li>
        <input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="8 or more_4" title="8 or more" id="ID5515606" />
        <label contenteditable="true" unselectable="off" for="ID5515606">8 or more</label>
    </li>
  </ol></ektdesignns_choices><input type="submit" value="Vote" />')

这将从您的(X)Html中检索所有<label>元素作为单个XML字符串:

SELECT
    Html.query('//label')
FROM @HtmlTbl 
WHERE ID = 1

输出:

<label contenteditable="true" unselectable="off" for="ID2504263">1 or fewer</label>
<label contenteditable="true" unselectable="off" for="ID5115606">2-4</label>
<label contenteditable="true" unselectable="off" for="ID477116">5-7</label>
<label contenteditable="true" unselectable="off" for="ID5515606">8 or more</label>

或者这将选择<label>标签的所有内容,每行一个:

SELECT
    C.value('(.)[1]', 'varchar(1000)')
FROM @HtmlTbl 
CROSS APPLY Html.nodes('//label') AS T(C)
WHERE ID = 1

输出:

1 or fewer
2-4
5-7
8 or more

0
投票

从数据库中提取数据,然后使用HTML解析器提取所需的信息。它会让你的生活更轻松。

无论您做什么,请不要尝试使用RegExs,除非您只查找与正则表达式匹配的数据。 (因为HTML不是常规语言,它经常导致比解决的问题更多的问题)


0
投票

如果您拥有的所有HTML都与此形成良好,您可以将其转换为XML并使用一些XQuery来查找标签节点,

select T.N.value('.', 'nvarchar(100)')
from Table
    cross apply XMLCol.nodes('//label') as T(N)

0
投票

如果要从定义良好的标记中提取值,可以使用PATINDEX和SUBSTRING

 /*
    <HTML><head><meta name='viewport' content='width=device-width, initial-scale=1.0'>
    </head>
    <BODY onload='document.frmLaunch.submit();'> Redirecting... 
    <FORM name='frmLaunch' method='POST' action='https://acsabsatest.bankserv.co.za/mdpayacs/pareq'>
    <input type=hidden name='PaReq' value='eJxVUctuwjAQ/JWID8jaJjy1tRRKJXKg4iWQuLnOtkSUJDgJ0H597ZCU9pSZ2ex4dxY3B0M0XZOuDEmcU1GoD/KS+KmzWC1HnHWDHu9IXIQrOku8kCmSLJXcZ75AaKntM/qg0lKi0udJ9Co55wgNxhOZaCq56CLcIabqRHKi0mNB5hK+m0Qrr/VAqKuosyotzZcccIbQEqzMpzyUZV6MAa7Xq//WmPg6878VgqsjPOZZVA4V1u+WxHK3iWfrlzlf87lYH6NgyfP9bhvddtvwCcH9gbEqSQrGRywQgceGYz4aswCh1lGd3CByH6484TM7WCNg7t4J70S4wl8BbbqGUt0u0zKkW56l5FoQfjHGVGi7RPN5bPA8c/nq0iY47AeDXn9Qh1wLziqxAQnOerWXIwiuBZrjQXNdi/5d/Qf60asq'>
    <input type=hidden name='TermUrl' value='http://TermUrl'>
    <input type=hidden name='MD' value='469695'></FORM></BODY></HTML>
    */
--Find the start of the tag
    SELECT PATINDEX('%<input type=hidden name=''PaReq'' value=%', @webViewData); 
--(Answer is 246)
-->Find the end of the tag
    SELECT PATINDEX('%''>%', substring(@webViewData,PATINDEX('%<input type=hidden name=''PaReq'' value=%', @webViewData),len(@webViewData))); 
--(Answer is 468)
--Get the value content of the tag
    select substring(@webViewData,246+39,468-40)
--Everything combined:
    select substring(@webViewData,PATINDEX('%<input type=hidden name=''PaReq'' value=%', @webViewData)+39,PATINDEX('%''>%', substring(@webViewData,PATINDEX('%<input type=hidden name=''PaReq'' value=%', @webViewData),len(@webViewData)))-40)
© www.soinside.com 2019 - 2024. All rights reserved.