HtmlUnit is a Java library used to simulate a web browser without the use of an actual browser GUI. The WebClient class is the central class within HtmlUnit, providing an interface to use the capabilities of the library. It is used to create a virtual browser, make requests, and interact with web pages programmatically. Below are some common methods provided by the WebClient class:
Navigation and Page Retrieval
getPage(String url): Loads a web page from the specified URL and returns aPageobject that represents the loaded page.getPage(URL url): Similar to the above but takes aURLobject instead of a string.getPage(WebRequest request): Loads a page based on aWebRequestobject that allows for more detailed configuration of the request.
Configuration and Settings
getOptions(): Returns theWebClientOptionsobject that holds WebClient's options/settings, allowing for the modification of settings like JavaScript and CSS support, timeouts, and proxy settings.getCookieManager(): Returns theCookieManagerused by this WebClient which allows for manipulation of cookies.getCache(): Returns the cache used by this web client.getJavaScriptEngine(): Returns the JavaScript engine used by this WebClient.
JavaScript and Ajax
waitForBackgroundJavaScript(long timeoutMillis): Waits for JavaScript to execute in the background up to a specified timeout, which is useful for pages that have AJAX calls that complete after the initial page load.isJavaScriptEnabled(): Checks whether JavaScript execution is enabled.setJavaScriptEnabled(boolean enabled): Enables or disables JavaScript execution.
Event Listeners and Handlers
setAlertHandler(AlertHandler alertHandler): Sets the handler that will handle JavaScript alert() calls.setConfirmHandler(ConfirmHandler confirmHandler): Sets the handler that will handle JavaScript confirm() calls.setPromptHandler(PromptHandler promptHandler): Sets the handler that will handle JavaScript prompt() calls.
Headers and Responses
addRequestHeader(String name, String value): Adds a request header that will be sent with all future requests.removeRequestHeader(String name): Removes a previously added request header.getCurrentWindow(): Returns theWebWindowthat represents the current window or frame.
Miscellaneous
close(): Closes the WebClient and all associated windows, which is important to free resources.getWebConnection(): Returns theWebConnectionobject that is used to send requests to the server.
Here's an example of how you might use the WebClient class to navigate to a web page and print its title in Java:
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
public class WebClientExample {
public static void main(String[] args) {
// Create a new instance of WebClient
try (final WebClient webClient = new WebClient()) {
// Navigate to a web page and get the Page object
HtmlPage page = webClient.getPage("http://example.com");
// Print the title of the page
System.out.println("Page title: " + page.getTitleText());
} catch (Exception e) {
e.printStackTrace();
}
}
}
Make sure to handle exceptions and close the WebClient properly to avoid leaking resources. The try-with-resources statement in the example above ensures that the WebClient is closed automatically.