我在SQL Server专栏中有一些html内容,我想从html中读取内容。
例如:
<ektdesignns_choices ektdesignns_nodetype="element" title="How many gigs do you play each month?" ektdesignns_caption="How many gigs do you play each month?" name="ektpoll1303074024421" ektdesignns_name="ektpoll1303074024421" id="ektpoll1303074024421">
<ol contenteditable="false" onkeypress="design_validate_choice(1, -1, this, 'Options are required.')" onclick="design_validate_choice(1, -1, this, 'Options are required.')" onblur="design_validate_choice(1, -1, this, 'Options are required.')" ektdesignns_validation="choice-req" ektdesignns_maxoccurs="1" ektdesignns_minoccurs="1" unselectable="on" title="How many gigs do you play each month?" class="design_list_vertical">
<li>
<input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="1 or fewer_1" title="1 or fewer" id="ID2504263" />
<label contenteditable="true" unselectable="off" for="ID2504263">1 or fewer</label>
</li>
<li>
<input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="2-4_2" title="2-4" id="ID5115606" />
<label contenteditable="true" unselectable="off" for="ID5115606">2-4</label>
</li>
<li>
<input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="5-7_3" title="5-7" id="ID477116" />
<label contenteditable="true" unselectable="off" for="ID477116">5-7</label>
</li>
<li>
<input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="8 or more_4" title="8 or more" id="ID5515606" />
<label contenteditable="true" unselectable="off" for="ID5515606">8 or more</label>
</li>
</ol>
</ektdesignns_choices><input type="submit" value="Vote" />
我想阅读这个html中的所有标签。任何人都有任何想法,我该怎么办呢?
如果您的HTML确实符合XHTML,并且您的SQL存储在SQL Server表的XML
列中,那么您可以使用XQuery在T-SQL中从中检索您的标签:
DECLARE @HtmlTbl TABLE (ID INT IDENTITY, Html XML)
INSERT INTO @HtmlTbl(Html) VALUES('<ektdesignns_choices ektdesignns_nodetype="element" title="How many gigs do you play each month?" ektdesignns_caption="How many gigs do you play each month?" name="ektpoll1303074024421" ektdesignns_name="ektpoll1303074024421" id="ektpoll1303074024421">
<ol contenteditable="false" onkeypress="design_validate_choice(1, -1, this, ''Options are required.'')" onclick="design_validate_choice(1, -1, this, ''Options are required.'')" onblur="design_validate_choice(1, -1, this, ''Options are required.'')" ektdesignns_validation="choice-req" ektdesignns_maxoccurs="1" ektdesignns_minoccurs="1" unselectable="on" title="How many gigs do you play each month?" class="design_list_vertical">
<li>
<input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="1 or fewer_1" title="1 or fewer" id="ID2504263" />
<label contenteditable="true" unselectable="off" for="ID2504263">1 or fewer</label>
</li>
<li>
<input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="2-4_2" title="2-4" id="ID5115606" />
<label contenteditable="true" unselectable="off" for="ID5115606">2-4</label>
</li>
<li>
<input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="5-7_3" title="5-7" id="ID477116" />
<label contenteditable="true" unselectable="off" for="ID477116">5-7</label>
</li>
<li>
<input type="radio" ektdesignns_nodetype="item" name="ektpoll1303074024421" value="8 or more_4" title="8 or more" id="ID5515606" />
<label contenteditable="true" unselectable="off" for="ID5515606">8 or more</label>
</li>
</ol></ektdesignns_choices><input type="submit" value="Vote" />')
这将从您的(X)Html中检索所有<label>
元素作为单个XML字符串:
SELECT
Html.query('//label')
FROM @HtmlTbl
WHERE ID = 1
输出:
<label contenteditable="true" unselectable="off" for="ID2504263">1 or fewer</label>
<label contenteditable="true" unselectable="off" for="ID5115606">2-4</label>
<label contenteditable="true" unselectable="off" for="ID477116">5-7</label>
<label contenteditable="true" unselectable="off" for="ID5515606">8 or more</label>
或者这将选择<label>
标签的所有内容,每行一个:
SELECT
C.value('(.)[1]', 'varchar(1000)')
FROM @HtmlTbl
CROSS APPLY Html.nodes('//label') AS T(C)
WHERE ID = 1
输出:
1 or fewer
2-4
5-7
8 or more
从数据库中提取数据,然后使用HTML解析器提取所需的信息。它会让你的生活更轻松。
无论您做什么,请不要尝试使用RegExs,除非您只查找与正则表达式匹配的数据。 (因为HTML不是常规语言,它经常导致比解决的问题更多的问题)
如果您拥有的所有HTML都与此形成良好,您可以将其转换为XML并使用一些XQuery来查找标签节点,
select T.N.value('.', 'nvarchar(100)')
from Table
cross apply XMLCol.nodes('//label') as T(N)
如果要从定义良好的标记中提取值,可以使用PATINDEX和SUBSTRING
/*
<HTML><head><meta name='viewport' content='width=device-width, initial-scale=1.0'>
</head>
<BODY onload='document.frmLaunch.submit();'> Redirecting...
<FORM name='frmLaunch' method='POST' action='https://acsabsatest.bankserv.co.za/mdpayacs/pareq'>
<input type=hidden name='PaReq' value='eJxVUctuwjAQ/JWID8jaJjy1tRRKJXKg4iWQuLnOtkSUJDgJ0H597ZCU9pSZ2ex4dxY3B0M0XZOuDEmcU1GoD/KS+KmzWC1HnHWDHu9IXIQrOku8kCmSLJXcZ75AaKntM/qg0lKi0udJ9Co55wgNxhOZaCq56CLcIabqRHKi0mNB5hK+m0Qrr/VAqKuosyotzZcccIbQEqzMpzyUZV6MAa7Xq//WmPg6878VgqsjPOZZVA4V1u+WxHK3iWfrlzlf87lYH6NgyfP9bhvddtvwCcH9gbEqSQrGRywQgceGYz4aswCh1lGd3CByH6484TM7WCNg7t4J70S4wl8BbbqGUt0u0zKkW56l5FoQfjHGVGi7RPN5bPA8c/nq0iY47AeDXn9Qh1wLziqxAQnOerWXIwiuBZrjQXNdi/5d/Qf60asq'>
<input type=hidden name='TermUrl' value='http://TermUrl'>
<input type=hidden name='MD' value='469695'></FORM></BODY></HTML>
*/
--Find the start of the tag
SELECT PATINDEX('%<input type=hidden name=''PaReq'' value=%', @webViewData);
--(Answer is 246)
-->Find the end of the tag
SELECT PATINDEX('%''>%', substring(@webViewData,PATINDEX('%<input type=hidden name=''PaReq'' value=%', @webViewData),len(@webViewData)));
--(Answer is 468)
--Get the value content of the tag
select substring(@webViewData,246+39,468-40)
--Everything combined:
select substring(@webViewData,PATINDEX('%<input type=hidden name=''PaReq'' value=%', @webViewData)+39,PATINDEX('%''>%', substring(@webViewData,PATINDEX('%<input type=hidden name=''PaReq'' value=%', @webViewData),len(@webViewData)))-40)