Cookie Policy

We use cookies to operate this website, improve usability, personalize your experience, and improve our marketing. Privacy Policy.

By clicking "Accept" or further use of this website, you agree to allow cookies.

Accept
Learn Machine Learning by Doing Learn Now
You are reading solutions
Cansın-Guler-profile-photo.jpg
Author: Cansin Guler
Software Engineer

Python Pandas TypeError: unhashable type: 'Series'

TypeError: unhashable type: 'Series'

Why does this happen?

This error occurs when attempting to use a Pandas Series object in a place where a _hashable_ object is expected.

For example, if you were to try to use a Series as a dictionary key:

import pandas as pd

series = pd.Series(['a', 'b', 'c', 'd'])

data = {
    series: ['dictionary', 'values', 'can', 'be', 'unhashable']
}
Out:
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_31660/779148665.py in <module>
      3 series = pd.Series(['a', 'b', 'c', 'd'])
      4 
----> 5 data = {
      6     series: ['dictionary', 'values', 'can', 'be', 'unhashable']
      7 }  
TypeError: unhashable type: 'Series'

The error occurs because dictionary keys must be hashable, which means they must be immutable (unchanging).

The most notable places in Python where you must use a hashable object are dictionary keys, set elements, and Pandas Index values, including DataFrame columns. Since a Series object is not hashable, it won't work for any of these cases.

Below, we'll explore two main situations:

  1. Intentional use of a Series, where we'll look at ways to convert a Series into something that can pass hashability
  2. Accidental use of a Series, where we'll explore two cases where unintentional Series objects are commonly produced: slicing DataFrames and using iterrows.

Before getting to the possible causes, let's understand hashability. If you're comfortable with hashing already, feel free to skip to the solutions section.

Hashability and why Series aren't hashable

A hash code refers to the integer representation of objects. Objects can be translated into hash codes by passing them to the built-in hash() function, like so:

print(hash(1))
print(hash('One'))
Out:
1
-8044073179068880476

The hash function returns a bit representation of the object. For the hash function to work, the object being hashed must be immutable. The hash code of the same object will always be the same—the hash of "One" will always result in the same binary value.

Immutable objects

Dictionaries, sets, lists, and Series are mutable and, therefore, cannot be hashed. Conversely, numeric types, booleans, and strings are immutable, so they can all be hashed. Tuples are also immutable but can only be hashed if their elements and subelements are also immutable.

We can test whether any object is hashable by passing it to hash(). Let's try it on a Series object:

import pandas as pd

a_series = pd.Series(
    [0.1, 0.2, 0.3, 0.4],
    index=['a', 'b', 'c', 'd']
)

hash(a_series)
Out:
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_31660/4023393776.py in <module>
      6 )
      7 
----> 8 hash(a_series)

TypeError: unhashable type: 'Series'

Since a Series object is mutable, Python can't assign it a unique hash.

For a more detailed description of how hash codes are used in Python, check out Brandon Craig Rhodes' 2010 PyCon speech, The Mighty Dictionary).

Now that we understand hashability, we can discuss the possible causes of the unhashable type: 'Series error.

Cause 1: Assigning Series Objects to Dictionary Keys, Set Elements, or Pandas Index Values

Dictionary keys, set elements, and Pandas Index values are all required to be of a hashable type. As mentioned, using a Pandas Series in any of these places will cause an error.

Bringing back the intro example, let's try using a Series object as a key in a dictionary:

series = pd.Series(['a', 'b', 'c', 'd'])

data = {
    series: ['dictionary', 'values', 'can', 'be', 'unhashable']
}
Out:
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_31660/2872028360.py in <module>
      1 series = pd.Series(['a', 'b', 'c', 'd'])
      2 
----> 3 data = {
      4     series: ['dictionary', 'values', 'can', 'be', 'unhashable']
      5 }  
TypeError: unhashable type: 'Series'

Dictionaries and sets quickly raise the error, but DataFrames and Series may overlook such mistakes at first. Take a look at the two Series objects below:

series = pd.Series(['a', 'b', 'c', 'd'])

correct_series = pd.Series(
    data=['a', 'b', 'c', 'd'], 
    index=series
)

print('CORRECT SERIES:', correct_series, sep='\n') 


faulty_series = pd.Series(
    data=['a', 'b', 'c', 'd'], 
    index=[1, 2, 3, a_series]  # note the difference here
)

print('FAULTY SERIES:', faulty_series, sep='\n')
Out:
CORRECT SERIES:
a    a
b    b
c    c
d    d
dtype: object
FAULTY SERIES:
1                       a
2                       b
3                       c
[0.1, 0.2, 0.3, 0.4]    d
dtype: object

In correct_series, elements of series got matched to the elements of the index, whereas in the faulty_series, we assigned series as a value to index. The latter assignment should be forbidden, yet, there is no error message.

However, when we interact with the index in some way, we see an error:

# Rename the index 1 to 'one'
renamed = faulty_series.rename({1: 'one'})
Out:
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_31660/3747663594.py in <module>
      1 # Rename the index 1 to 'one'
----> 2 renamed = faulty_series.rename({1: 'one'})
...

TypeError: unhashable type: 'Series'

Attempting to rename the first index returned an error message. The code itself doesn't cause an error until we attempt to use the faulty structure of the index.

Solution

We have to replace our Series object with something hashable. A named tuple is an ideal hashable alternative to the Series since it also uses key-value pairing.

Below, we parse the Series into a named tuple before using them as dictionary keys:

from collections import namedtuple

s1 = pd.Series(data=[1, 2, 3], index=["one", "two", "three"])
s2 = pd.Series(data=[4, 5, 6], index=["four", "five", "six"])

# names the tuple 'series1' and matches s1's indices with its values.
nt1 = namedtuple("series1", s1.index)(*s1)

# names the tuple 'series2' and matches s2's indices with its values.
nt2 = namedtuple("series2", s2.index)(*s2)

d = {nt1: "random value", nt2: "another random value"}

print(d)
Out:
{series1(one=1, two=2, three=3): 'random value', series2(four=4, five=5, six=6): 'another random value'}

We've essentially frozen our Series into named tuples, allowing them to be hashed and used as dictionary keys.

Now, we can access values in the dictionary using one of the named tuples:

val = d[nt1]

print(val)

Cause 2: Slicing the DataFrame Wrong

You may be accidentally using a Series where a hashable object is expected. One common scenario is trying to extract a scalar from a DataFrame but ending up with a Series due to incorrect slicing.

To demonstrate, let's make a simple movies DataFrame:

movies = pd.DataFrame(
    data=[
        ["War Dogs", "Todd Phillips", 2016],
        ["Money Ball", "Bennett Miller", 2011],
        ["The Irishman", "Martin Scorsese", 2019],
        ["Joker", "Todd Phillips", 2019],
        ["The Wolf of Wall Street", "Martin Scorsese", 2013],
    ],
    columns=["Name", "Director", "Year"],
)
movies
Out:
NameDirectorYear
0War DogsTodd Phillips2016
1Money BallBennett Miller2011
2The IrishmanMartin Scorsese2019
3JokerTodd Phillips2019
4The Wolf of Wall StreetMartin Scorsese2013

The code below intends to count the times each director's name was mentioned in the movies and report it in dictionary format.

from collections import defaultdict

# Use defaultdict to initialize every value with a 0
mentions = defaultdict(int)

for i in range(len(movies)):
    director_name = movies.loc[i, ["Director"]]  # attempts to get the director's name for every row i
    mentions[director_name] += 1                 # for every mention of a director, add 1 to its count

print(mentions)
Out:
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_31660/4137918548.py in <module>
      6 for i in range(len(movies)):
      7     director_name = movies.loc[i, ["Director"]]  # gets the Director value for every row i
----> 8     mentions[director_name] += 1                 # for every mention of a director, add 1 to its count
      9 
     10 print(mentions)
TypeError: unhashable type: 'Series'

The error message claims there is a problem with using director_name as a key to the mentions dictionary. Even though we meant to extract a string, the program passed a Series object as director_name.

Solution

Let's squeeze in a print statement before the erroneous line and look at the director_name.

for i in range(len(movies)):
    director_name = movies.loc[i, ["Director"]]
    print(director_name)
    mentions[director_name] += 1
Out:
Director    Todd Phillips
Name: 0, dtype: object
Out:
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_31660/172541356.py in <module>
      2     director_name = movies.loc[i, ["Director"]]
      3     print(director_name)
----> 4     mentions[director_name] += 1

TypeError: unhashable type: 'Series'

We are indeed getting the one value but in Series format. This is because of the brackets surrounding our column selection.

Even though the brackets have one label inside (['Director']), loc deemed it a list and expected multiple values. It, therefore, created a Series object.

Let's get rid of the brackets and rerun the same code:

for i in range(len(movies)):
    director_name = movies.loc[i, "Director"]  # stripped the ['Director'] from the brackets
    mentions[director_name] += 1

print(mentions)
Out:
defaultdict(<class 'int'>, {'Todd Phillips': 2, 'Bennett Miller': 1, 'Martin Scorsese': 2})

We now get the expected output.

Cause 3: Not Unpacking Iterrows

Let's simulate this scenario using the movies DataFrame again:

movies = pd.DataFrame(
    data=[
        ["War Dogs", "Todd Phillips", 2016],
        ["Money Ball", "Bennett Miller", 2011],
        ["The Irishman", "Martin Scorsese", 2019],
        ["Joker", "Todd Phillips", 2019],
        ["The Wolf of Wall Street", "Martin Scorsese", 2013],
    ],
    columns=["Name", "Director", "Year"],
)

Like before, we'll try to count each time a director was mentioned in movies and report it in dictionary format. This time, we'll use iterrows() to iterate through the DataFrame.

mentions = defaultdict(int) 

# Using iterrows now
for row in movies.iterrows(): 
    director_name = movies.loc[row, "Director"]  
    another_dict[key] += 1  

print(mentions)
Out:
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_31660/3090072551.py in <module>
      3 # Using iterrows now
      4 for row in movies.iterrows():
----> 5     director_name = movies.loc[row, "Director"]
      6     another_dict[key] += 1
      7 
...

TypeError: unhashable type: 'Series'

Here, the error claims we have passed an unhashable value to loc[]. Since Indexes can only hold hashable values, loc expects a hashable selector, so row seems to be a problem.

Solution

Let's squeeze in a print statement before the erroneous line and look at the row.

for row in movies.iterrows():
    print(row)
    director_name = movies.loc[row, "Director"]
    mentions[director_name] += 1
Out:
(0, Name             War Dogs
Director    Todd Phillips
Year                 2016
Name: 0, dtype: object)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_31660/291743484.py in <module>
      1 for row in movies.iterrows():
      2     print(row)
----> 3     director_name = movies.loc[row, "Director"]
      4     mentions[director_name] += 1
...

TypeError: unhashable type: 'Series'

We are getting a tuple with two values: an index of 0 and the row itself.

This is because iterrows() returns a tuple of the format: [Hashable, Series] for each row it iterates through. While the Hashable holds the row's index label, the Series holds the row's data.

The proper use of the iterrows requires us to unpack it like so:

for _, row in movies.iterrows():
    directors_name = movies.loc[index, "Director"]
    mentions[directors_name] += 1

print(mentions)
Out:
defaultdict(<class 'int'>, {'Todd Phillips': 2, 'Bennett Miller': 1, 'Martin Scorsese': 2})

Since we don't need the value for index, we're using an underscore (_) to throw it away. And we now have the same mentions count as before.

Summary

Python enforces hashability on dictionary keys, set elements, and Pandas Index values. Since it is unhashable, a Series object is not a good fit for any of these.

Furthermore, unintended Series objects may be the cause. Slicing DataFrames incorrectly or using iterrows without unpacking the return value can produce Series values when it's not the intended type.


Meet the Authors

Cansın-Guler-profile-photo.jpg

Software engineer, technical writer and trainer.

Brendan Martin
Editor: Brendan
Founder of LearnDataSci

Get updates in your inbox

Join over 7,500 data science learners.