Cookie Policy

We use cookies to operate this website, improve usability, personalize your experience, and improve our marketing. Privacy Policy.

By clicking "Accept" or further use of this website, you agree to allow cookies.

Accept
Learn Machine Learning by Doing Learn Now
You are reading solutions / Python
alfie-grace-headshot-square2.jpg
Author: Alfie Grace
Data Scientist

Python String Contains – See if String Contains a Substring

LearnDataSci is reader-supported. When you purchase through links on our site, earned commissions help support our team of writers, researchers, and designers at no extra cost to you.

An easy way to check if a string contains a particular phrase is by using an if ... in statement. We can do this as follows:

if 'apples' in 'This string has apples':
    print('Apples in string')
else:
    print('Apples not in string')
Out:
Apples in string

Today we'll take a look at the various options you've got for checking if a string contains a substring. We'll start by exploring the use of if ... in statements, followed by using the find() function. Towards the end, there is also a section on employing regular expressions (regex) with re.search() to search strings.

🚀 Start Your Own Analytics Consulting Company

Go from side-hustling to earning enough to quit your job. Check it out →

Option 1: if ... in

The example above demonstrated a quick way to find a substring within another string using an if ... in statement. The statement will return True if the string does contain what we're looking for and False if not. See below for an extension of the example used previously:

strings = ['This string has apples', 'This string has oranges', 'This string has neither']

for s in strings:
    if 'apples' in s:
        print('Apples in string')
    else:
        print('Apples not in string')
Out:
Apples in string
Apples not in string
Apples not in string

The output displays that our if ... in statement looking for 'apples' only returned True for the first item in strings, which is correct.

It's worth mentioning that if ... in statements are case-sensitive. The line if 'apples' in string: wouldn't detect 'Apples'. One way of correcting this is by using the lower() method, which converts all string characters into lowercase.

We can utilize the lower() method with the change below:

strings = ['This string has apples', 'This string has oranges', 'This string has Apples']

for s in strings:
    if 'apples' in s.lower():
        print('Apples in string')
    else:
        print('Apples not in string')
Out:
Apples in string
Apples not in string
Apples in string

Alternatively, we could use the upper() function to search for 'APPLES' instead.

The if .. in approach has the fastest performance in most cases. It also has excellent readability, making it easy for other developers to understand what a script does.

Of the three options listed in this article, using if ... in is usually the best approach for seeing if a string contains a substring. Remember that the simplest solution is quite often the best one!

Option 2: find()

Another option you've got for searching a string is using the find() method. If the argument we provide find() exists in a string, then the function will return the start location index of the substring we're looking for. If not, then the function will return -1. The image below shows how string characters are assigned indexes:

We can apply find() to the first if ... in example as follows:

strings = ['This string has apples', 'This string has oranges', 'This string has neither']

for s in strings:
    apples_index = s.find('apples')
    if apples_index < 0:
        print('Apples not in string')
    else:
        print(f'Apples in string starting at index {apples_index}')
Out:
Apples in string starting at index 16
Apples not in string
Apples not in string

For the first list item, 'apples' started at index 16, so find('apples') returns 16. 'apples' isn't in the string for the other two items, so find('apples') returns -1.

The index() function can be used similarly and will also return the starting index of its argument. The disadvantage of using index() is that it will throw ValueError: substring not found if Python can't find the argument. The find() and index() functions are also both case-sensitive.

Option 3: Regex search()

Regex is short for regular expression, which is kind of like its own programming language. Through re.search, a regex search, we can determine if a string matches a pattern. The re.search() function generates a Match object if the pattern makes a match.

Here's an example:

import re

re.search('apples', 'This string has apples')
Out:
<re.Match object; span=(16, 22), match='apples'>

Looking at the Match object, span gives us the start and end index for 'apples'. Slicing the string using 'This string has apples'[16:22] returns the substring 'apples'. The match field shows us the part of the string that was a match, which can be helpful when searching for a range of possible substrings that meet the search conditions.

We can access the span and match attributes using the span() andgroup() methods, as follows:

print(re.search('apples', 'This string has apples').span())

print(re.search('apples', 'This string has apples').group())
Out:
(16, 22)
apples

If the substring isn't a match, we get the null value None instead of getting a Match object. See the example below for how we can apply regex to the string problem we've been using:

strings = ['This string has apples', 'This string has oranges', 'This string has neither']

for s in strings:
    if re.search('apples', s):
        print('Apples in string')
    else:
        print('Apples not in string')
Out:
Apples in string
Apples not in string
Apples not in string

In this case, the if statement determines if re.search() returns anything other than None.

We could argue that regex might be overkill for a simple functionality like this. But something like the example above is a great starting point for regex, which has plenty of other capabilities.

For instance, we could change the first argument of the search() function to 'apples|oranges', where | is the "OR" logical operator. In this context re.search() would return a match object for any strings with the substring 'apples' or 'oranges'.

The following demonstrates an example of this:

strings = ['This string has apples', 'This string has oranges', 'This string has neither']

for s in strings:
    if re.search('apples|oranges', s):
        print('Apples or oranges in string')
    else:
        print('Neither fruit is in string')
Out:
Apples or oranges in string
Apples or oranges in string
Neither fruit is in string

Summary

The easiest and most effective way to see if a string contains a substring is by using if ... in statements, which return True if the substring is detected. Alternatively, by using the find() function, it's possible to get the index that a substring starts at, or -1 if Python can't find the substring. REGEX is also an option, with re.search() generating a Match object if Python finds the first argument within the second one.


Meet the Authors

alfie-grace-headshot-square2.jpg

Alfie graduated with a Master's degree in Mechanical Engineering from University College London. He's currently working as Data Scientist at Square Enix. Find him on LinkedIn.

Get updates in your inbox

Join over 7,500 data science learners.