You are reading solutions
alfie-grace-headshot-square2.jpg
Author: Alfie Grace
Data Scientist

Python Check if Files Exist – os.path, Pathlib, try/except

A simple way of checking if a file exists is by using the exists() function from the os library. The function is shown below with example_file.txt:

import os

os.path.exists('example_file.txt')
Out:
True

In this case, the file exists, so the exists() function has returned True. If the file didn't exist, the function would return False. Today we'll look at some of the reasons you may want to check if a file exists.

We'll also look at a few different methods for opening files and some practical examples of where those methods would be beneficial.

Why Check if Files Exist

Many advanced Python programs rely on files for some functionality; this may include using log files for recording significant events, using a file containing program settings, or even using image files for an interface.

Looking more specifically at data science, you could be looking for a file with import data, a pre-trained machine learning model, or a backup file for recording results before export.

The presence of a file could signal what phase the program is in and influence what action your script needs to take next. Another critical factor to consider is that attempting to open a file that doesn't exist can cause your program to crash. As a result, you want to make sure you've got some processes that catch these problems before they happen.

Option 1: os

Example 1

This section will explore the use of the os library for checking if files exist. os is a good place to start for people who are new to Python and object-oriented programming. For the first example, let's build on the quick example shown in the introduction, discussing the exists() function in better detail. Let's say we've developed a simple GUI for an application that we're developing.

As part of the program, we could keep a file called log.txt, which tracks everything we do in the interface. The first time we run the program, the log file might not exist yet, so we'll need to check if it exists before performing any operations on the file.

Using the exists() function, we could do this as follows:

import os

if not os.path.exists('log.txt'):
    # log.txt doesn't exist, create a blank one
    with open('log.txt', 'w') as f:
        f.write('Program Log\n')

If this is the first time running the script, then no log exists, and our if block will create a fresh log for us. This feature will prevent a crash from happening during log operations.

Note

Note that we've used a context manager for operating on the file. For more examples, see the python close file solution.

Example 2

For a more complex example, imagine that we're deploying an automated machine learning model to predict the weather. Since our model relies on historical weather data, we've also developed a web scraper that automatically runs once a day and stores the data in a file called weather_data_today.csv inside the input directory.

As part of our model deployment, we could add a function check_for_new_data(), which uses exists() to see when a new file gets uploaded. Our function will then move the file to a different directory using the shutil.move() function. We can then use the data to update our weather predictions.

We could schedule the function to run every hour, so the program constantly checks for new data. With this setup, the next time our scraper uploads weather_data_today.csv to the input folder, our script will detect the change and update the predictions again, creating an automated solution. This process would look something like the code below:

import os
import shutil

def check_for_new_data():
    if os.path.exists('input/weather_data_today.csv'):
        # new data has been uploaded, move data to model folder
        shutil.move('input/weather_data_today.csv', 'model')

A major limitation of using os.path.exists() is that after checking if a file exists, another process running in the background could delete it.

Large programs are often a combination of moving parts, with different scripts running at a time. Suppose our if os.path.exists() line returns True, then another function deletes the file, the file operation coupled with our if statement could cause our program to crash. This sequence of events is known as a race condition. One way around this issue is by using the pathlib library, as seen in the next section.

Option 2: pathlib

An alternative approach to os is to use the pathlib library instead. pathlib is a more object-oriented way of checking files, which offers many options for file interaction.

Here, we'll check for the existence of the log.txt file in a similar way to the first example of the os section, but this time using pathlib to do the job:

from pathlib import Path

log_file = Path('log.txt')

if not log_file.exists():
    # log file doesn't exist, create a blank one
    with open(log_file, 'w') as f:
        f.write('Program Log\n')

In this example, we've created the object log_file using the Path() class. Similar to the os example, using exists() here will return True if the file exists. If the file doesn't exist, this is covered by our if branch, creating a new log file for us.

A significant advantage of using Path() is that it doesn't matter if the file is deleted elsewhere by a concurrent process running in the background because it stores files as an object. Thus, even if a file is deleted somewhere between our if statement and the file operation, the file operation won't cause a program crash because using Path() creates a local copy of the file in the form of an object.

In terms of performance, os executes slightly faster than pathlib, so there are some situations where it may have an edge. Despite this, pathlib is often considered a better choice than os, with pathlib allowing you to chain methods and functions together quickly. pathlib also prevents the race condition issue associated with using os.

Option 3: Exception Handling

If you aren't planning to do much with a file and only want to check if a file exists, then using the previous options is appropriate. In cases where you'd like to perform a file operation immediately after checking if the file exists, using these approaches could be viewed as inefficient.

In these situations, exception handling is preferable. With exception handling, your script can detect the FileNotFoundError: [Errno 2] No such file or directory error as it occurs and react to it accordingly.

Building on the log.txt example used in the os section, let's say that we'd like to update the log to have a line reading 'New session' every time we start our program. We could do this as shown below:

try:
    # The log file exists, append new session to the file
    with open('log.txt', 'a') as f:
        f.write('New Session\n')
except FileNotFoundError:
    # The log file doesn't exist, create a blank one
    with open('log.txt', 'w') as f:
        f.write('Program Log\n')

The try-except branches are working similarly to an if-else statement. Python try to open the file, and if it works, append 'New Session'. Python will enter the except statement if a FileNotFoundError is thrown. This means the file doesn't exist, so we create a new one.

As mentioned previously, in situations where you'd like to complete an operation on the file if it exists, using exception handling is the preferred approach. The exception handling solution performs the same functionality as an os solution but doesn't require a library import. The latter approach also seems clumsy. It requires two interactions with the file (checking if it exists, then appending/writing), compared with just appending/writing in our exception handling solution.

This solution also helps us to avoid race conditions. The with statement will hold files open even if they are deleted by a concurrent process, preventing the program from crashing.

In most cases, it's considered more Pythonic to avoid exception handling. But when it comes to files, exception handling is a bit of a gray area. It'll be up to you to find the correct balance between having a more efficient piece of code versus too much exception handling, which could cause you to miss vital errors further down the line.

Summary

Checking for the existence of particular files can be fundamental to preventing your program from crashing. We've explored the uses of the os and pathlib libraries, along with exception handling. Looking at the different situations that these approaches are better suited to and weighing up the pros and cons of each.

We recommend using the os library for beginners, it's a lot clunkier, but it may be easier for you to wrap your head around if you're new to file operations. For more advanced users with a better understanding of object-oriented programming, pathlib is the way to go. For certain situations where these libraries aren't required, exception handling can be an excellent alternative for checking if files exist, although you should use this approach with caution!

Take the internet's best data science courses Learn More

Meet the Authors

alfie-grace-headshot-square2.jpg

Alfie graduated with a Master's degree in Mechanical Engineering from University College London. He's currently working as a top-rated data scientist on Upwork. Find him on LinkedIn.

Brendan Martin
Editor: Brendan Martin
Founder of LearnDataSci

Get updates in your inbox

Join over 7,500 data science learners.