CHAPTERS
OVERVIEW
WebDriver is simple and concise remote programming interface that can be used to control, or in other words drive, a browser either locally or on a remote machine. It is a platform neutral and programming language neutral wire protocol that can be used to remotely instruct behavior of the browser like manipulating discovering DOM elements, manipulating DOM elements, control the behavior of user agent, etc.
Now part of Selenium project, the combination of Selenium WebDriver encompasses the language bindings as well as the implementations of the code that controls individual browsers, which are now often simply referred to as WebDriver. The tool is specially useful in performing browser automation testing across various browsers and operating systems.
The online world has evolved quickly, and with each new application, a higher standard is set for user experience. When it comes to developing websites and web apps, it's important to ensure a seamless end-user experience. That's why automation testing is the best way to test your product across various browser and operating system combinations.
Because it offers support for a wide variety of programming languages, including Java, C#, Ruby JavaScript and more, Selenium can be an effective tool for large organizations that wish to automate their software testing process.
This WebDriver tutorial explores what WebDriver is, its features, how it works, best practices, and more.
Letās begin!
Here's this Selenium 4 complete tutorial that covers everything you need to know about Selenium 4.
WebDriver is a browser automation technology that allows users to control web browsers as if they were using them directly. Whether it's on a local machine or a remote server using the Selenium server, WebDriver empowers users with seamless control over web browsers, making it an essential tool for web automation tasks.
This specification provides a set of interfaces to discover and manipulate the DOM, focusing on web compatibility. This specification is primarily intended for use in automated testing of user agents but may also be used in such a way as to allow in-browser scripts to control a browser.
Selenium is an open-source test automation framework that allows web apps to be tested across different browsers & operating systems. It supports compatibility with multiple programming languages such as Java, JavaScript, Python, C#, and more, so testers can automate their website testing in any programming language they are comfortable with.
Selenium framework allows testers to deliver test cycles faster by automating repeated test cases. Selenium integrates seamlessly with CI/CD pipeline and can help with a sturdy, bug-free release deployment pipeline.
Selenium WebDriver, often referred as WebDriver, is a robust web testing framework that enables automation of browser activities across various browsers. This tool is instrumental in validating that your web application performs as anticipated under different scenarios.
An added advantage of Selenium WebDriver is its flexibility - it allows for the creation of test scripts in multiple programming languages. In essence, WebDriver is both a language binding and a unique implementation of browser-controlling code, making it a powerful tool for cross-browser testing.
At the time of writing this, Selenium 4, the latest version of Selenium, is the talk of the town since its launch in 2021. Check out this video to know about Selenium 4 Grid architecture, overview of relative locators, W3C in Selenium WebDriver, and much more.
WebDriver is a set of standards used by different browsers. Browsers such as Chrome, Firefox, Edge, and more, use these standards to make respective browser drivers such as ChromeDriver and Gecko. The testing community widely uses the WebDriver framework to perform automation testing on web applications and native mobile applications. Wondering why? Because the tests performed on WebDriver are simple and concise. All these reasons have made testers adopt WebDriver to fulfill their browser testing needs.
If youāre a developer whoās passionate about quality assurance, then this is the right place for you.
Whatever your level of WebDriver skill, this Selenium WebDriver tutorial unleashes the full potential of test automation. This will help you get everything up and running and give you all the information (and code) you need to create powerful test automation solutions.
Check out this video if you want run your first Selenium test script using JavaScript.
The WebDriver protocol allows for communication between:
Remote ends are classified into two broad conformance classes called node types, which are:
To communicate with WebDriver, endpoints must provide an HTTP-compliant wire protocol that maps to different commands.
This standard does not constrain how local ends interact with their users. Local ends are only expected to be compatible with the Remote End Protocol; they're not required to expose a user-facing API. WebDriver protocol includes the following:
Method | URI Template | Command |
---|---|---|
Method | URI Template | Command |
POST | /session | New Session |
DELETE | /session/{session id} | Delete Session |
GET | /status | Status |
GET | /session/{session id}/timeouts | Get Timeouts |
POST | /session/{session id}/timeouts | Set Timeouts |
POST | /session/{session id}/url | Navigate To |
GET | /session/{session id}/url | Get Current URL |
POST | /session/{session id}/back | Back |
POST | /session/{session id}/forward | Forward |
POST | /session/{session id}/refresh | Refresh |
GET | /session/{session id}/title | Get Title |
GET | /session/{session id}/window | Get Window Handle |
DELETE | /session/{session id}/window | Close Window |
POST | /session/{session id}/window | Switch To Window |
GET | /session/{session id}/window/handles | Get Window Handles |
POST | /session/{session id}/window/new | New Window |
POST | /session/{session id}/frame | Switch To Frame |
POST | /session/{session id}/frame/parent | Switch To Parent Frame |
GET | /session/{session id}/window/rect | Get Window Rect |
POST | /session/{session id}/window/rect | Set Window Rect |
POST | /session/{session id}/window/maximize | Maximize Window |
POST | /session/{session id}/window/minimize | Minimize Window |
POST | /session/{session id}/window/fullscreen | Fullscreen Window |
GET | /session/{session id}/element/active | Get Active Element |
GET | /session/{session id}/element/{element id}/shadow | Get Element Shadow Root |
POST | /session/{session id}/element | Find Element |
POST | /session/{session id}/elements | Find Element |
POST | /session/{session id}/element/{element id}/element | Find Element From Element |
POST | /session/{session id}/element/{element id}/elements | Find Elements From Element |
POST | /session/{session id}/shadow/{shadow id}/element | Find Element From Shadow Root |
POST | /session/{session id}/shadow/{shadow id}/elements | Find Elements From Shadow Root |
GET | /session/{session id}/element/{element id}/selected | Is Element Selected |
GET | /session/{session id}/element/{element id}/attribute/{name} | Get Element Attribute |
GET | /session/{session id}/element/{element id}/property/{name} | Get Element Property |
GET | /session/{session id}/element/{element id}/css/{property name} | Get Element CSS Value |
GET | /session/{session id}/element/{element id}/text | Get Element Text |
GET | /session/{session id}/element/{element id}/name | Get Element Tag Name |
GET | /session/{session id}/element/{element id}/rect | Get Element Rect |
GET | /session/{session id}/element/{element id}/enabled | Is Element Enabled |
GET | /session/{session id}/element/{element id}/computedrole | Get Computed Role |
GET | /session/{session id}/element/{element id}/computedlabel | Get Computed Label |
POST | /session/{session id}/element/{element id}/click | Element Click |
POST | /session/{session id}/element/{element id}/clear | Element Clear |
POST | /session/{session id}/element/{element id}/value | Element Send Keys |
GET | /session/{session id}/source | Get Page Source |
POST | /session/{session id}/execute/sync | Execute Script |
POST | /session/{session id}/execute/async | Execute Async Script |
GET | /session/{session id}/cookie | Get All Cookies |
GET | /session/{session id}/cookie/{name} | Get Named Cookie |
POST | /session/{session id}/cookie | Add Cookie |
DELETE | /session/{session id}/cookie/{name} | Delete Cookie |
DELETE | /session/{session id}/cookie | Delete All Cookies |
POST | /session/{session id}/actions | Perform Actions |
DELETE | /session/{session id}/actions | Release Actions |
POST | /session/{session id}/alert/dismiss | Dismiss Alert |
POST | /session/{session id}/alert/accept | Accept Alert |
GET | /session/{session id}/alert/text | Get Alert Text |
POST | /session/{session id}/alert/text | Send Alert Text |
GET | /session/{session id}/screenshot | Take Screenshot |
GET | /session/{session id}/element/{element id}/screenshot | Take Element Screenshot |
POST | /session/{session id}/print | Print Page |
WebDriver capabilities are used to communicate what features the implementation supports. The local end can use capabilities to describe the features it requires the remote end to satisfy when creating a new session. Likewise, the remote end can use capabilities to describe its full feature set for a session.
The following table lists the capabilities that each implementation must support. Each implementation may define its extension capabilities.
Capability | Key | Value Type | Description |
---|---|---|---|
Browser name | "browserName" | string | Identifies the user agent. |
Browser version | "browserVersion" | string | Identifies the version of the user agent. |
Platform name | "platformName" | string | Identifies the operating system of the endpoint node. |
Accept insecure TLS certificates | "acceptInsecureCerts" | boolean | Indicates whether untrusted and self-signed TLS certificates are implicitly trusted on navigation for the duration of the session. |
Page load strategy | "pageLoadStrategy" | string | Defines the current sessionās page load strategy. |
Proxy configuration | "proxy" | JSON Object | Defines the current sessionās proxy configuration. |
Window dimensioning/positioning | "setWindowRect" | boolean | Indicates whether the remote end supports all of the resizing and repositioning commands. |
Session timeouts | "timeouts" | JSON Object | Describes the timeouts imposed on certain session operations. |
Strict file interactability | "strictFileInteractability" | boolean | Defines the current sessionās strict file interactability. |
Unhandled prompt behavior | "unhandledPromptBehavior" | string | Describes the current sessionās user prompt handler. Defaults to the dismiss and notify state. |
A session is a single-user agent instance, including all its child browsers.WebDriver provides each session with a unique identifier that can be used to differentiate one session from another, allowing multiple user agents to be controlled from a single HTTP server and allowing sessions to be routed via a multiplexer (known as an intermediary node).
A WebDriver session is an instance of the connection between a local end and a specific remote end.
HTTP Method | URI Template |
---|---|
POST | /session |
The New Session command creates a new WebDriver session, which attempts to connect to the endpoint node. If the creation fails, the WebDriver client returns an error message.
HTTP Method | URI Template |
---|---|
DELETE | /session/{session id} |
The remote end steps are:
HTTP Method | URI Template |
---|---|
GET | /status |
The status session return information about the remote end's ability to create new sessions. But it may additionally include meta information specific to the implementation.
The following table lists some useful and common WebDriver commands and their syntax:
S.No. | Command and Description |
---|---|
1. | driver.get("URL"); To navigate to an application. |
2. | element.sendKeys("inputtext"); Enter some text into an input box. |
3. | element.clear(); Clear the contents from the input box. |
4. | select.deselectAll(); Deselect all OPTIONs from the first SELECT on the page. |
5. | select.selectByVisibleText("some text"); Select the OPTION with the input specified by the user. |
6. | driver.switchTo().window("windowName"); Move the focus from one window to another. |
7. | driver.switchTo().frame("frameName"); Swing from frame to frame. |
8. | driver.switchTo().alert(); Helps in handling alerts. |
9. | driver.navigate().to("URL"); Navigate to the URL. |
10. | driver.navigate().forward(); To navigate forward. |
11. | driver.navigate().back(); To navigate back. |
12. | driver.close(); Closes the current browser associated with the driver. |
13. | driver.quit(); Quits the driver and closes all the associated window of that driver. |
14. | driver.refresh(); Refreshes the current page. |
Screenshots are a great way to provide visual diagnostic information. Screenshots take a snapshot of the initial viewportās frame buffer as a lossless PNG image and return it to the local end as a Base64 encoded string.
The WebDriver's Take Screenshot command captures the top-level browsing context's initial viewport, and the Take Element Screenshot command allows you to capture an element's visible region after it has been scrolled into view.
HTTP Method | URI Template |
---|---|
GET | /session/{session id}/screenshot |
Remote end steps are:
HTTP Method | URI Template |
---|---|
GET | /session/{session id}/element/{element id}/screenshot |
Remote end steps are:
Selenium WebDriver allows us to create cross-browser tests using a programming language of our choice.
The Selenium WebDriver architecture consists of four major components:
Selenium developers have built language bindings to support the use of the program in multiple languages. For example, if you are writing your tests in Java, you can use the Java bindings. Client libraries can be downloaded from the official Selenium website.
JSON (JavaScript Object Notation) is a data-interchange format that makes it easier to read and write data between server and client. It supports data structures like objects and arrays, which makes it easier to transfer data between clients and servers.
To develop a secure connection with the browser, Selenium uses Drivers. Each driver is specific to each browser and is responsible for handling all the logic that makes up that particular browser. In addition, each automation language has its own corresponding driver. Each of the following series of actions occurs when a Selenium automation tests is triggered:
ChromeDriver, GeckoDriver, MicrosoftEdge driver, etc. are some browser drivers.
Browsers act as the end-point of our test executions. Here are the supported browsers:
Selenium is a widely used open source testing framework that comes with a lot of features:
WebDriver is a feature-rich framework among the tester community. But, it also has some limitations of its own:
Here are some of the best practices of Selenium WebDriver to make your life easier:
In this section of the Selenium WebDriver tutorial, you will learn how to run advanced use cases using WebDriver.
If you are just starting with Selenium automation testing of your product, the first page you would probably want to automate would be the SignUp or Login Page. This tutorial will help you automate the testing of user sign-up forms.
SEE MORE āWondering why you are asked to fill up your credentials while accessing a few websites? This chapter mainly focuses on introducing authentication pop-ups on the website and the different ways to handle them using Selenium.
SEE MORE āTo achieve digital security, Captcha is one of the various ways. Captcha is easy for humans to solve but hard for ābotsā and other malicious software to figure out. Check out this chapter on how to handle Captcha in Selenium using Java.
SEE MORE āDuring the process of Selenium automation testing of the website, you can realise specific scenarios by automating low-level interactions such as keypresses and mouse button actions (e.g. click, double click, right-click, etc.) with the WebElement(s) in the DOM. These interactions, also called Actions, play an integral part in testing an application using the Selenium framework. Check out this chapter to learn more.
SEE MORE āA cookie is a packet of data received by your machine and sent back to the server without being altered or changed. Cookies tend to contain information allowing websites to track your visits and activities. Check out this chapter on how to handle cookies in Selenium WebDriver using Java.
SEE MORE āIdentifying the elements may be an easy task, but your tests might fail due to the state of the WebElement (e.g., not visible, not clickable, etc.). As a test automation engineer, it is crucial to consider these things to build a fool-proof test automation strategy. Learn how to deal with the Element is not clickable at point exception using Selenium and Java.
SEE MORE āID locator in Selenium WebDriver is one of the most reliable web locators. IDs are unique for each element, thereby making the ID locator a dependable choice of locating WebElements.
SEE MORE āLocating dynamic elements has always been a pain area when you want to automate the scripts. XPath locator in Selenium WebDriver helps in dynamically searching for an element within a web page.
SEE MORE āDuring the process of test automation, you would come across a number of scenarios where a decision needs to be taken regarding āWhat if the test(s) result in a failure?ā If the error (or issue) being encountered is a minor one, you might want the test execution to continue. In case of serious errors, it is better to abort the execution of the test case (or test suite). This can be achieved using āAssert and Verify in Selenium WebDriverā. Check out this chapter to learn how to you can use Assert and Verify to your benefit!
SEE MORE āIn this Selenium xUnit tutorial, we take a quick look at how to install Selenium WebDriver in Visual Studio for performing automation testing with C#.
SEE MORE āSelenium is today's web developers' top choice when choosing an automation testing tool. It has been loved by testers and developers alike worldwide. Through this extensive Selenium WebDriver tutorial, we hope to answer every question you have regarding WebDriver testing.
You can also use LambdaTest's cloud platform to run your first Selenium WebDriver test script.
Happy Testing!
The Selenium WebDriver tool is a popular solution for automating web application testing. It supports many browsers such as Firefox, Chrome, Internet Explorer, and Safari. But because it is limited to testing web applications, we may not want to use it for every project.
The WebDriver interface enables introspection and control of user agents (browsers). WebDriver's methods fall into three categories: controlling the browser itself, selecting WebElements, and evaluating expressions.
WebDriver is a web automation tool that is used to test web applications across many browsers. ChromeDriver is an implementation of the W3C WebDriver standard that runs on a standalone server.
The WebDriver API allows developers to write tests that automate a browser from a separate controlling process. It can also be implemented in such a way as to allow in-browser scripts to control a ā possibly separate ā browser.
Reviewer's Profile
Shahzeb Hoda
Shahzeb currently holds the position of Senior Product Marketing Manager at LambdaTest and brings a wealth of experience spanning over a decade in Quality Engineering, Security, and E-Learning domains. Over the course of his 3-year tenure at LambdaTest, he actively contributes to the review process of blogs, learning hubs, and product updates. With a Master's degree (M.Tech) in Computer Science and a seasoned expert in the technology domain, he possesses extensive knowledge spanning diverse areas of web development and software testing, including automation testing, DevOps, continuous testing, and beyond.
Get 100 minutes of automation test minutes FREE!!