You are reading solutions

Author: Cansin Guler
Software Engineer

Pandas df.explode(): Unnesting Series and DataFrame Columns

LearnDataSci is reader-supported. When you purchase through links on our site, earned commissions help support our team of writers, researchers, and designers at no extra cost to you.

Pandas' explode() flattens nested Series objects and DataFrame columns by unfurling the list-like values and spreading their content to multiple rows.

Let's have a quick look. Take the DataFrame below:

import pandas as pd

data = [['Kelly Osborn', ['Calculus', 'Programming']], ['Jade Reed', ['Biology', 'Chemistry', 'Physics']]]
df = pd.DataFrame(data, columns= ['Student', 'Subject'])

df

    
        Learn Data Science with

Out:

	Student	Subject
0	Kelly Osborn	[Calculus, Programming]
1	Jade Reed	[Biology, Chemistry, Physics]

We can call explode() to unpack the values under Subject, like so:

df.explode('Subject')

    
        Learn Data Science with

Out:

	Student	Subject
0	Kelly Osborn	Calculus
0	Kelly Osborn	Programming
1	Jade Reed	Biology
1	Jade Reed	Chemistry
1	Jade Reed	Physics

How does explode() work?

Having multiple values bunched up in one cell (in a _list-like_ form) can create a challenge for analysis. explode() (adopted by Pandas in version 0.25.0) tackles this particular problem.

explode() has two parameters:

column - specifies the column(s) to be exploded. It is either a column name or a bracketed list of column names.
ignore_index - decides whether the original indexes are to be reset. It is False by default.

Let's work on an example. Take the DataFrame below:

midterm_data = {'name': ['Jack', 'David'], 'math101': [[35, 67], []], 'comp101': [[50, 73], [20, 40]]}
midterms = pd.DataFrame(midterm_data)

midterms

    
        Learn Data Science with

Out:

	name	math101	comp101
0	Jack	[35, 67]	[50, 73]
1	David	[]	[20, 40]

Let's separate the comp101 column as a Series of its own:

midterms['comp101']

    
        Learn Data Science with

Out:

0    [50, 73]
1    [20, 40]
Name: comp101, dtype: object

    
        Learn Data Science with

We can apply explode() to it directly without any parameters:

midterms['comp101'].explode()

    
        Learn Data Science with

Out:

0    50
0    73
1    20
1    40
Name: comp101, dtype: object

    
        Learn Data Science with

Notice that even though we now have numbers populating the cells, the dtype stays object. This is the case for every column explode() produces.

Now, let's run explode() on the DataFrame itself. In this case, we have to specify the column to be exploded:

midterms.explode('math101')

    
        Learn Data Science with

Out:

	name	math101	comp101
0	Jack	35	[50, 73]
0	Jack	67	[50, 73]
1	David	NaN	[20, 40]

As you can see, the first row repeated itself for each value of math101, and the empty list of the second row got replaced with NaN.

In this DataFrame, the row labels do not hold any particular information. We could reset them by passing True to ignore_index. Like so:

midterms.explode('comp101', True)

    
        Learn Data Science with

Out:

	name	math101	comp101
0	Jack	[35, 67]	50
1	Jack	[35, 67]	73
2	David	[]	20
3	David	[]	40

The ignore_index=True creates numeric indices from 0 to n.

Exploding List-Looking Strings

The previous section defined the explode() behavior as 'unpacking list-like values'. explode(), more precisely, works on Python lists, tuples and sets, Pandas Series, and Numpy n-dimensional arrays. It does not affect Strings, and this is known to cause problems.

Take the DataFrame below:

movie_data = {'movie': ['Memento', 'Casablanca'], 'genre':["['Thriller', 'Mistery']", "['Drama', 'Romance', 'War']"]}
movies = pd.DataFrame(movie_data)

movies

    
        Learn Data Science with

Out:

	movie	genre
0	Memento	['Thriller', 'Mistery']
1	Casablanca	['Drama', 'Romance', 'War']

Let's try to explode movies' genre column:

movies.explode('genre')

    
        Learn Data Science with

Out:

	movie	genre
0	Memento	['Thriller', 'Mistery']
1	Casablanca	['Drama', 'Romance', 'War']

explode() fails us here without any apparent error.

This happens -often when working on imported data. Here, it is easy to pinpoint the problem since we intentionally filled the genre column with list-looking Strings.

We must convert these String values to lists before running explode(). Since the Strings under genre have the exact form of a Python list, we can use ast.literal_eval like so:

from ast import literal_eval

movies['genre'] = movies['genre'].apply(literal_eval)

movies

    
        Learn Data Science with

Out:

	movie	genre
0	Memento	[Thriller, Mistery]
1	Casablanca	[Drama, Romance, War]

And now, we should be able to explode the genre column:

movies.explode('genre')

    
        Learn Data Science with

Out:

	movie	genre
0	Memento	Thriller
0	Memento	Mistery
1	Casablanca	Drama
1	Casablanca	Romance
1	Casablanca	War

literal_eval only applies, however, to Strings that have the _literal form_ of a Python list, set, or tuple.

Let's take a DataFrame where the data is slightly different in structure:

book_data = {'book': ['Little Women', 'Jane Eyre'], 'tags':["classics,historical,young adult", "classics,romance, gothic"]}
books = pd.DataFrame(book_data)

books

    
        Learn Data Science with

Out:

	book	tags
0	Little Women	classics,historical,young adult
1	Jane Eyre	classics,romance, gothic

Here, we can use str.split to transform the values in the tags column.

Like so:

books = books.assign(tags=books.tags.str.split(","))

books

    
        Learn Data Science with

Out:

	book	tags
0	Little Women	[classics, historical, young adult]
1	Jane Eyre	[classics, romance, gothic]

And now we explode it:

books.explode("tags")

    
        Learn Data Science with

Out:

	book	tags
0	Little Women	classics
0	Little Women	historical
0	Little Women	young adult
1	Jane Eyre	classics
1	Jane Eyre	romance
1	Jane Eyre	gothic

Exploding Multiple Columns at Once

We can explode more than one column at a time, given their list-like values on each row match in length.

Let's create a new midterms DataFrame:

midterm_data = {'name': ['Nia', 'Millie'], 'calc': [[80, 88], [45, 50]], 'bio': [[80, 43], [78, 50]], 'chem': [[], [50, 67]]}
midterms = pd.DataFrame(midterm_data)

midterms

    
        Learn Data Science with

Out:

	name	calc	bio	chem
0	Nia	[80, 88]	[80, 43]	[]
1	Millie	[45, 50]	[78, 50]	[50, 67]

Since calc and bio entries in each row match in length, we can explode these columns together:

midterms.explode(['calc', 'bio'])

    
        Learn Data Science with

Out:

	name	calc	bio	chem
0	Nia	80	80	[]
0	Nia	88	43	[]
1	Millie	45	78	[50, 67]
1	Millie	50	50	[50, 67]

However, we cannot explode the chem column along with the others since its first value is an empty list, while others in the same row hold two elements each.

Let's try it, though:

midterms.explode(['calc', 'bio', 'chem'])

    
        Learn Data Science with

Out:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [17], in <cell line: 1>()
----> 1 midterms.explode(['calc', 'bio', 'chem'])
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\frame.py:8351, in DataFrame.explode(self, column, ignore_index)
   8349     for c in columns[1:]:
   8350         if not all(counts0 == self[c].apply(mylen)):
-> 8351             raise ValueError("columns must have matching element counts")
   8352     result = DataFrame({c: df[c].explode() for c in columns})
   8353 result = df.drop(columns, axis=1).join(result)
ValueError: columns must have matching element counts

    
        Learn Data Science with

Note that the names must be enclosed in brackets when exploding multiple columns. Otherwise, Pandas throws no error yet only changes the first column whose name was given:

midterms.explode('calc', 'bio')

    
        Learn Data Science with

Summary

explode() unnests the multi-value cells in a given Series or DataFrame column, transforming wide-format data into long-format.

Start Learning for Free

Meet the Authors

Cansin Guler Software Engineer

Software engineer, technical writer and trainer.

Editor: Brendan
Founder of LearnDataSci

Back to blog index

Pandas df.explode(): Unnesting Series and DataFrame Columns

How does explode() work?

Exploding List-Looking Strings

Exploding Multiple Columns at Once

Summary

Recent articles:

The 9 Best AI Courses Online for 2024: Beginner to Advanced

The 6 Best Python Courses for 2024 – Ranked by Software Engineer

Best Course Deals for Black Friday and Cyber Monday 2024

Sigmoid Function

Meet the Authors

Cookie Policy

Pandas df.explode(): Unnesting Series and DataFrame Columns

How does explode() work?

Exploding List-Looking Strings

Exploding Multiple Columns at Once

Summary

Get updates in your inbox

Recent articles:

The 9 Best AI Courses Online for 2024: Beginner to Advanced

The 6 Best Python Courses for 2024 – Ranked by Software Engineer

Best Course Deals for Black Friday and Cyber Monday 2024

Sigmoid Function

Get updates in your inbox

Meet the Authors

Get updates in your inbox