这个问题已经被问过一次了,但我猜 API 已经改变,答案不再有效。
URL url = new URL("http://www.example.com");
StringWebResponse response = new StringWebResponse("<html><head><title>Test</title></head><body></body></html>", url);
HtmlPage page = HTMLParser.parseHtml(response, new TopLevelWindow("top", new WebClient()));
System.out.println(page.getTitleText());
无法完成,因为 TopLevelWindow 受到保护,因此扩展/实现窗口之类的东西是荒谬的:)
有人知道如何做到这一点吗?在我看来很奇怪,这不容易做到。
此代码适用于 GroovyConsole
@Grapes(
@Grab(group='net.sourceforge.htmlunit', module='htmlunit', version='2.8')
)
import com.gargoylesoftware.htmlunit.*
import com.gargoylesoftware.htmlunit.html.*
URL url = new URL("http://www.example.com");
StringWebResponse response = new StringWebResponse("<html><head><title>Test</title></head><body></body></html>", url);
WebClient client = new WebClient()
HtmlPage page = HTMLParser.parseHtml(response, client.getCurrentWindow());
System.out.println(page.getTitleText());
使用 HTMLUnit 2.40,Grooveek 的代码将无法编译,您会得到“无法从 HTMLParser 类型对非静态方法 parseHtml(WebResponse, WebWindow) 进行静态引用”。但是现在有一个类 HtmlUnitNekoHtmlParser 实现了 HTMLParser 接口,所以下面的代码可以工作:
StringWebResponse response = new StringWebResponse(
"<html><head><title>Test</title></head><body></body></html>",
new URL("http://www.example.com"));
HtmlPage page = new HtmlUnitNekoHtmlParser().parseHtml(
response, new WebClient().getCurrentWindow());
常见问题解答中有一些示例代码https://htmlunit.sourceforge.io/faq.html#HowToParseHtmlString
例如
final String htmlCode = "<html>"
+ " <head>"
+ " <title>Title</title>"
+ " </head>"
+ " <body>"
+ " content..."
+ " </body>"
+ "</html> ";
try (WebClient webClient = new WebClient(browserVersion)) {
final HtmlPage page = webClient.loadHtmlCodeIntoCurrentWindow(htmlCode);
// work with the html page
}
String htmlAsString = "<body />";
StringWebResponse response = new StringWebResponse(htmlAsString, new URL("your url"));
HtmlPage htmlPage = new HtmlPage(response, webClient.getCurrentWindow());