ValueError: the truth value of a series is ambiguous. use a.empty, a.bool(), a.item(), a.any() or a.all()
This error is usually triggered when creating a copy of a dataframe that matches either a single or multiple conditions. Let's consider the example dataframe below:
If we want to retrieve the cars with prices less than 20,000 you might try the following:
This error occurs because the
if statement requires a truth value, i.e., a statement evaluating to
False. In the above example, the
< operator used against a dataframe will return a boolean series, containing a combination of
False for its values. Here's what the result actually looks like:
Since a series is returned, Python doesn't know which value to use, meaning that the series has an ambiguous truth value.
Instead, we can pass this statement into dataframe brackets to get the desired values:
We can also match multiple conditions using
Let's go further in depth on different solutions for this error.
Cause 1: Looking for rows that meet a single condition
Let's say we want to get all cars less than 30,000 using the following boolean series:
A boolean series like this is known as a mask. By passing this mask to the same dataframe, we get back only interested in values of the dataframe that have a
True value for the matching index in our boolean series.
The rows with indexes of 0, 1, and 3 all have a
True value in our mask. Therefore, these are the rows our statement above returns.
Using any() and all()
all() are two ways to obtain a single truth value based on a mask.
For example, we can also use the method
.any() to return
True if any of the values in a mask are
Similarly, we can use
.all(), which will return
True only when all of the values in a mask are
Cause 2: Looking for rows that meet multiple conditions
Building on our example from the previous section, let's try and find cars that cost less than 30000 and have mileage under 2000. Using the solution from the first section, we could build upon this:
Notice that we're getting an error again, This time, it's because Python is interpreting the statement as return
df['price'] < 30000 and
df['mileage'] < 2000. We know that
df['price'] < 30000 and
df['mileage'] < 2000 both return a mask, so the truth value is ambiguous here.
To resolve this issue, we need to replace
& symbol is a bitwise operator, meaning it compares the two statements bit by bit. Using
& will return a copy of the dataframe containing rows with a
True value in the mask generated by both conditions.
By using the
| operator in place of
or, we can return a copy containing rows that have a
True value in the mask generated by either condition, as shown:
Furthermore, we can also use the
~ operator, which is the bitwise equivalent of
~ operator essentially reverses what comes after it, which is the compound bitmask in the parentheses.
This value error is caused by using a mask (boolean series) in the place of a truth value. A mask has values that are either
False, varying from row to row. As a result, Python can't determine whether a series as a whole is
False - it is ambiguous.
When searching for dataframe rows that only match a single condition, we can avoid the error with masking, using
df and placing the statement generating the mask within the brackets, for example,
df[df['price'] < 30000].
If looking for rows that match multiple conditions, to avoid the error, we must replace statements like
not with their respective bitwise operators,