In this article we want to talk about Python Regex for Pattern Matching, so Python is powerful programming language and it offers different functionalities, also Python is able to handle regular expressions or regex. Regex allows you to search and manipulate strings using complex patterns, and this is a powerful tool for text processing and data cleaning. in this article we want to talk about basics of Python regex and how you can use it for pattern matching.
What is Regex ?
We can say that regular expression or regex is a sequence of characters that define a search pattern. in Python you can use re module to work with regular expressions. re module provides different functions that allows you to search and manipulate strings using regex patterns.
What is Python Regex Patterns ?
Regex patterns are made up of a combination of characters and special characters. characters represent themselves and match the same character in the string you are searching. special characters have a specific meaning and can match different types of characters such as digits, letters or whitespace.
These are some commonly used special characters in Python regex:
|matches any character except a newline
|matches zero or more occurrences of the preceding character
|matches one or more occurrences of the preceding character
|matches zero or one occurrence of the preceding character
|matches any digit (0-9)
|matches any alphanumeric character (a-z, A-Z, 0-9, _)
|matches any whitespace character (space, tab, newline)
What is Python Regex Functions ?
re module provides different functions that allows you to work with regex patterns. these are some commonly used functions:
|searches the string for the first occurrence of the pattern and returns a match object
|returns a list of all non-overlapping occurrences of the pattern in the string
|re.sub(pattern, repl, string)
|replaces all occurrences of the pattern in the string with the replacement string
Now let’s create an example, we want to extract all the email addresses from this string. we can use re.findall() function to achieve this:
emails = "firstname.lastname@example.org, email@example.com, firstname.lastname@example.org"
pattern = r'\w+@\w+\.\w+'
matches = re.findall(pattern, emails)
In the above example we have used regex pattern r’\w+@\w+\.\w+’ to match email addresses in the string. this pattern matches one or more alphanumeric characters, followed by an at sign (@), followed by one or more alphanumeric characters, followed by a dot (.), and finally, followed by one or more alphanumeric characters.
This will be the result