You are reading solutions / Python
alfie-grace-headshot-square2.jpg
Author: Alfie Grace
Data Scientist

ValueError: the truth value of a series is ambiguous. use a.empty, a.bool(), a.item(), a.any() or a.all()

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

This error is usually triggered when creating a copy of a dataframe that matches either a single or multiple conditions. Let's consider the example dataframe below:

import pandas as pd

df = pd.DataFrame.from_dict({
    'manufacturer': ['BMW', 'Kia', 'Mercedes', 'Audi'], 
    'model': ['1 Series', 'Rio', 'A-Class', 'A3'],
    'price': [28000, 12500, 30000, 26500],
    'mileage': [1800, 4500, 400, 700]
    })

If we want to retrieve the cars with prices less than 20,000 you might try the following:

if df['price'] < 20000:
    print(df)
Out:
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-2357a7362348> in <module>
----> 1 if df['price'] < 20000:
      2     print(df)
~\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
   1476 
   1477     def __nonzero__(self):
-> 1478         raise ValueError(
   1479             f"The truth value of a {type(self).__name__} is ambiguous. "
   1480             "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

This error occurs because the if statement requires a truth value, i.e., a statement evaluating to True or False. In the above example, the < operator used against a dataframe will return a boolean series, containing a combination of True and False for its values. Here's what the result actually looks like:

df['price'] < 20000
Out:
0    False
1     True
2    False
3    False
Name: price, dtype: bool

Since a series is returned, Python doesn't know which value to use, meaning that the series has an ambiguous truth value.

Instead, we can pass this statement into dataframe brackets to get the desired values:

df[df['price'] < 20000]
Out:
manufacturermodelpricemileage
1KiaRio125004500

We can also match multiple conditions using | for or and & for and:

df[(df['price'] < 30000) & (df['mileage'] < 2000)]
Out:
manufacturermodelpricemileage
0BMW1 Series280001800
3AudiA326500700

Let's go further in depth on different solutions for this error.

Cause 1: Looking for rows that meet a single condition

Let's say we want to get all cars less than 30,000 using the following boolean series:

df['price'] < 30000
Out:
0     True
1     True
2    False
3     True
Name: price, dtype: bool

A boolean series like this is known as a mask. By passing this mask to the same dataframe, we get back only interested in values of the dataframe that have a True value for the matching index in our boolean series.

df[df['price'] < 30000]
Out:
manufacturermodelpricemileage
0BMW1 Series280001800
1KiaRio125004500
3AudiA326500700

The rows with indexes of 0, 1, and 3 all have a True value in our mask. Therefore, these are the rows our statement above returns.

Using any() and all()

any() and all() are two ways to obtain a single truth value based on a mask.

For example, we can also use the method .any() to return True if any of the values in a mask are True:

(df['price'] < 30000).any()
Out:
True

Similarly, we can use .all(), which will return True only when all of the values in a mask are True:

(df['price'] < 30000).all()
Out:
False

Cause 2: Looking for rows that meet multiple conditions

Building on our example from the previous section, let's try and find cars that cost less than 30000 and have mileage under 2000. Using the solution from the first section, we could build upon this:

df[(df['price'] < 30000) and (df['mileage'] < 2000)]
Out:
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-10-188d1a4841b0> in <module>
----> 1 df[(df['price'] < 30000) and (df['mileage'] < 2000)]

~\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
   1476 
   1477     def __nonzero__(self):
-> 1478         raise ValueError(
   1479             f"The truth value of a {type(self).__name__} is ambiguous. "
   1480             "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Notice that we're getting an error again, This time, it's because Python is interpreting the statement as return True if df['price'] < 30000 and df['mileage'] < 2000. We know that df['price'] < 30000 and df['mileage'] < 2000 both return a mask, so the truth value is ambiguous here.

To resolve this issue, we need to replace and with &:

df[(df['price'] < 30000) & (df['mileage'] < 2000)]
Out:
manufacturermodelpricemileage
0BMW1 Series280001800
3AudiA326500700

The & symbol is a bitwise operator, meaning it compares the two statements bit by bit. Using & will return a copy of the dataframe containing rows with a True value in the mask generated by both conditions.

By using the | operator in place of or, we can return a copy containing rows that have a True value in the mask generated by either condition, as shown:

df[(df['price'] < 20000) | (df['mileage'] < 1000)]
Out:
manufacturermodelpricemileage
1KiaRio125004500
2MercedesA-Class30000400
3AudiA326500700

Furthermore, we can also use the ~ operator, which is the bitwise equivalent of not:

df[~((df['price'] < 30000) & (df['mileage'] < 2000))]
Out:
manufacturermodelpricemileage
1KiaRio125004500
2MercedesA-Class30000400

The ~ operator essentially reverses what comes after it, which is the compound bitmask in the parentheses.

Summary:

This value error is caused by using a mask (boolean series) in the place of a truth value. A mask has values that are either True or False, varying from row to row. As a result, Python can't determine whether a series as a whole is True or False - it is ambiguous.

When searching for dataframe rows that only match a single condition, we can avoid the error with masking, using df[] and placing the statement generating the mask within the brackets, for example, df[df['price'] < 30000].

If looking for rows that match multiple conditions, to avoid the error, we must replace statements like and, or and not with their respective bitwise operators, &, | and ~.

Take the internet's best data science courses Learn More

Meet the Authors

alfie-grace-headshot-square2.jpg

Alfie graduated with a Master's degree in Mechanical Engineering from University College London. He's currently working as a top-rated data scientist on Upwork. Find him on LinkedIn.

Brendan Martin
Editor: Brendan Martin
Founder of LearnDataSci

Get updates in your inbox

Join over 7,500 data science learners.