Python Check if Files Exist – os.path, Pathlib, try/except
A simple way of checking if a file exists is by using the
exists() function from the
os library. The function is shown below with
In this case, the file exists, so the
exists() function has returned
True. If the file didn't exist, the function would return
False. Today we'll look at some of the reasons you may want to check if a file exists.
We'll also look at a few different methods for opening files and some practical examples of where those methods would be beneficial.
Why Check if Files Exist
Many advanced Python programs rely on files for some functionality; this may include using log files for recording significant events, using a file containing program settings, or even using image files for an interface.
Looking more specifically at data science, you could be looking for a file with import data, a pre-trained machine learning model, or a backup file for recording results before export.
The presence of a file could signal what phase the program is in and influence what action your script needs to take next. Another critical factor to consider is that attempting to open a file that doesn't exist can cause your program to crash. As a result, you want to make sure you've got some processes that catch these problems before they happen.
Option 1: os
This section will explore the use of the
os library for checking if files exist.
os is a good place to start for people who are new to Python and object-oriented programming. For the first example, let's build on the quick example shown in the introduction, discussing the
exists() function in better detail. Let's say we've developed a simple GUI for an application that we're developing.
As part of the program, we could keep a file called
log.txt, which tracks everything we do in the interface. The first time we run the program, the log file might not exist yet, so we'll need to check if it exists before performing any operations on the file.
exists() function, we could do this as follows:
If this is the first time running the script, then no log exists, and our
if block will create a fresh log for us. This feature will prevent a crash from happening during log operations.
For a more complex example, imagine that we're deploying an automated machine learning model to predict the weather. Since our model relies on historical weather data, we've also developed a web scraper that automatically runs once a day and stores the data in a file called
weather_data_today.csv inside the input directory.
As part of our model deployment, we could add a function
check_for_new_data(), which uses
exists() to see when a new file gets uploaded. Our function will then move the file to a different directory using the
shutil.move() function. We can then use the data to update our weather predictions.
We could schedule the function to run every hour, so the program constantly checks for new data. With this setup, the next time our scraper uploads
weather_data_today.csv to the
input folder, our script will detect the change and update the predictions again, creating an automated solution. This process would look something like the code below:
A major limitation of using
os.path.exists() is that after checking if a file exists, another process running in the background could delete it.
Large programs are often a combination of moving parts, with different scripts running at a time. Suppose our
if os.path.exists() line returns
True, then another function deletes the file, the file operation coupled with our
if statement could cause our program to crash. This sequence of events is known as a race condition. One way around this issue is by using the
pathlib library, as seen in the next section.
Option 2: pathlib
An alternative approach to
os is to use the
pathlib library instead.
pathlib is a more object-oriented way of checking files, which offers many options for file interaction.
Here, we'll check for the existence of the log.txt file in a similar way to the first example of the
os section, but this time using
pathlib to do the job:
In this example, we've created the object
log_file using the
Path() class. Similar to the
os example, using
exists() here will return
True if the file exists. If the file doesn't exist, this is covered by our
if branch, creating a new log file for us.
A significant advantage of using
Path() is that it doesn't matter if the file is deleted elsewhere by a concurrent process running in the background because it stores files as an object. Thus, even if a file is deleted somewhere between our if statement and the file operation, the file operation won't cause a program crash because using
Path() creates a local copy of the file in the form of an object.
In terms of performance,
os executes slightly faster than
pathlib, so there are some situations where it may have an edge. Despite this,
pathlib is often considered a better choice than
pathlib allowing you to chain methods and functions together quickly.
pathlib also prevents the race condition issue associated with using
Option 3: Exception Handling
If you aren't planning to do much with a file and only want to check if a file exists, then using the previous options is appropriate. In cases where you'd like to perform a file operation immediately after checking if the file exists, using these approaches could be viewed as inefficient.
In these situations, exception handling is preferable. With exception handling, your script can detect the
FileNotFoundError: [Errno 2] No such file or directory error as it occurs and react to it accordingly.
Building on the
log.txt example used in the
os section, let's say that we'd like to update the log to have a line reading 'New session' every time we start our program. We could do this as shown below:
try-except branches are working similarly to an
if-else statement. Python
try to open the file, and if it works, append 'New Session'. Python will enter the
except statement if a
FileNotFoundError is thrown. This means the file doesn't exist, so we create a new one.
As mentioned previously, in situations where you'd like to complete an operation on the file if it exists, using exception handling is the preferred approach. The exception handling solution performs the same functionality as an
os solution but doesn't require a library import. The latter approach also seems clumsy. It requires two interactions with the file (checking if it exists, then appending/writing), compared with just appending/writing in our exception handling solution.
This solution also helps us to avoid race conditions. The
with statement will hold files open even if they are deleted by a concurrent process, preventing the program from crashing.
In most cases, it's considered more Pythonic to avoid exception handling. But when it comes to files, exception handling is a bit of a gray area. It'll be up to you to find the correct balance between having a more efficient piece of code versus too much exception handling, which could cause you to miss vital errors further down the line.
Checking for the existence of particular files can be fundamental to preventing your program from crashing. We've explored the uses of the
pathlib libraries, along with exception handling. Looking at the different situations that these approaches are better suited to and weighing up the pros and cons of each.
We recommend using the
os library for beginners, it's a lot clunkier, but it may be easier for you to wrap your head around if you're new to file operations. For more advanced users with a better understanding of object-oriented programming,
pathlib is the way to go. For certain situations where these libraries aren't required, exception handling can be an excellent alternative for checking if files exist, although you should use this approach with caution!