In this Python Selenium tutorial we want to learn about Python Selenium and XPath Usage, so Selenium is an open source framework that enables automated web browser interactions. With Selenium, you can control different web browsers such as Chrome, Firefox and Safari, Selenium allows you to automate tasks like form submissions, UI testing, data scraping and many more, also Selenium allows you to identify and manipulate elements on a webpage with different method, one of them are XPath that we want to cover in this tutorial.
Python Selenium XPath
XPath is a language designed to navigate through XML and HTML documents by selecting elements based on their properties or relationships in the document structure. XPath expressions are used to locate elements accurately, and it is an excellent tool for web scraping and automation.
Python Selenium and XPath Usage
To leverage XPath in Python Selenium, we need to install the necessary dependencies. we can use pip for selenium installation.
1 |
pip install selenium |
XPath provides flexible element identification, and it allows us to target specific elements on a web page. Selenium offers different methods to locate elements using XPath expressions, such as:
- find_element_by_xpath: Finds the first element that matches the XPath expression.
- find_elements_by_xpath: Finds all elements that match the XPath expression.
XPath expressions can target elements based on their tag names, attributes, text content and relationships.
1 2 3 4 5 6 7 |
from selenium import webdriver driver = webdriver.Chrome("path/to/chromedriver") driver.get("https://example.com") # Find a specific element using XPath element = driver.find_element_by_xpath("//div[@class='example']") |
This is complete practical example for this article
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
from selenium import webdriver from selenium.webdriver.common.keys import Keys # Launch Chrome browser driver = webdriver.Chrome() # Navigate to Google driver.get("https://www.google.com") # Find the search input field and enter a query search_input = driver.find_element("name", "q") search_input.send_keys("Web automation with Python Selenium") search_input.send_keys(Keys.RETURN) # Wait for the search results to load driver.implicitly_wait(5) # Extract the search results results = driver.find_elements("xpath", "//div[@class='r']/a") for result in results: print(result.get_attribute("href")) # Close the browser driver.quit() |
First we need to import the required modules from Selenium
1 2 |
from selenium import webdriver from selenium.webdriver.common.keys import Keys |
In here our code initializes a Chrome browser instance using the Chrome() constructor from the webdriver module. This assumes you have the Chrome WebDriver executable in your system PATH, or manually added that to your working directory.
1 |
driver = webdriver.Chrome() |
This line instructs the browser to open the specified URL, which in this case is https://www.google.com.
1 |
driver.get("https://www.google.com") |
These lines locate the search input field on the Google page using the find_element() method with the attribute name and value q, which corresponds to the name of the input field on the Google search page. send_keys() method is then used to type the desired query, in this case, Web automation with Python Selenium, into the search input field. and lastly the send_keys() method with Keys.RETURN is called to simulate pressing the Enter key.
1 2 3 |
search_input = driver.find_element("name", "q") search_input.send_keys("Web automation with Python Selenium") search_input.send_keys(Keys.RETURN) |
This line instructs the browser to wait for a maximum of 5 seconds for the search results to load. implicitly_wait() method sets a global timeout that applies to all subsequent commands in the browser session.
1 |
driver.implicitly_wait(5) |
These lines use an XPath expression to locate all the search result links on the page. find_elements() method returns a list of matching elements. XPath expression used here selects all <a> elements that are children of <div> elements with the class name r, which is a common class name for search result links on Google. The loop iterates over the found results and prints the href attribute of each link.
1 2 3 |
results = driver.find_elements("xpath", "//div[@class='r']/a") for result in results: print(result.get_attribute("href")) |
Note : You can download the drivers from here.