Efficiently Managing Temporary Files in Python

Discover best practices for handling temporary files in Python

Posted by Pierre on March 1, 2024, 11:05 a.m.

Developers often create temporary files to store data that cannot fit in memory, or to share data between threads or with external programs. It can sound trivial as creating a file in the /tmp/ directory is easy with Python, but doing so correctly is actually harder that it seems and temporary files has to conform to the following rules:

  • File name has to be unique and unpredictable: using potentially predictable file names can create a security risk because malicious users who can guess the file name may manipulate it and trick your program. Also, using unique name will prevent you from bugs if many processes of your software attempt to read or write in a file at the same time.
  • Generally, temporary files should always be created on the local file system. Many remote file systems do not support the open flags needed to safely create temporary files (cf: file status flags on GNU linux).
  • Files must be cleaned up even in the face of errors: not doing so will cause your app to use more an more space on disk. For a client side application it is not really respectful for the user and, especially on a server side app, you may reach the limit of the disk storage. Even if temporary folders are, usually, cleanup during the shutdown, servers can have very long uptime.

  • Also, remember that temporary directories are often shared between many users and programs.

  • Additionally, it’s important to note that a directory is secure with respect to a particular user if only the user and the system administrator can create, move, or delete files inside the directory.

Tests for this article were performed using Python 3.12.

Summary

About the tempfile module

In order to help us to do things right, Python provides the tempfile module.

This modules exposes both high and low level interfaces that works on all supported platforms and that ensure that temporary files are well created. High level interfaces (TemporaryFile, NamedTemporaryFile, TemporaryDirectory, SpooledTemporaryFile) can be used as context managers and provide automatic cleanup while mkstemp() and mkdtemp() are low level functions that require manual cleanup.

In this article, we’ll focus on the high-level interfaces provided by the tempfile module.

Creating temporary files

The tempfile module provides two interfaces for working with temporary files:

  • TemporaryFile: This interface is straightforward to use but does not guarantee that the created file will have a visible name in the file system.
  • NamedTemporaryFile: Similar to TemporaryFile, but it ensures that the created file has a visible name in the file system, allowing you to reopen a previously closed file.

All files created using the tempfile module are readable and writable only by the user who created them.

The TemporaryFile interface

TemporaryFile returns a file-like object. By using it as a context manager, we ensure that the file is removed at the end of the operation:

import tempfile

with tempfile.TemporaryFile() as temporary_file:
    temporary_file.write('whatever')

The behavior of this method may vary depending on the platform: on Unix systems, the file entry may either not be created at all or removed immediately after creation. Other platforms do not support this behavior, and your code should not assume that a temporary file created with this function will have a visible name in the file system.

Please note that TemporaryFile accepts an optional dir argument. If you need to create a temporary file in a specific directory, specify the directory using the dir argument. If no directory is specified, TemporaryFile will choose from a platform-dependent list of common directories (such as /tmp on Linux systems).

The NamedTemporaryFile interface

NamedTemporaryFile has a behavior that is really similar to TemporaryFile except for the following differences:

  • The function returns a file that is guaranteed to a have a visible name in the file system.
  • It exposes two optional arguments: delete (default True) and delete_on_close (default True) which allow you to reopen a closed temporary file by using its name and path.

The easiest way to manipulate a NamedTemporaryFile is to use a context manager, just as with TemporaryFile:

import os
import tempfile

# Create a temporary file with a visible name in file system.
with tempfile.NamedTemporaryFile() as temporary_file:
   print(temporary_file.name)
   print(os.path.isfile(temporary_file.name))
   temporary_file.write(b'whatever')

print('--- Context manager exit ---')

# File has been removed at this point.
print(os.path.isfile(temporary_file.name))
/tmp/tmp87yk7rwz
True
--- Context manager exit ---
False

We can see that the file is named by including a string of random characters, ensuring its uniqueness. You can manipulate the file within the context manager (or until you reach the end of the with block).

You can achieve the same thing in a more manual way:

import tempfile

temporary_file = tempfile.NamedTemporaryFile()

print(temporary_file.name)
print(os.path.isfile(temporary_file.name))

try:
    temporary_file.write(b'whatever')
finally:
    # File is deleted on close.
    temporary_file.close()

print(os.path.isfile(temporary_file.name))

Closing the file in the finally clause is important in order to ensure that the file is removed even in the face of errors.

Reopening a closed temporary file

If you need more control, if you need to use the name of the temporary file to reopen the file after closing it, you can play with both delete and delete_on_close arguments.

  • delete to False will prevent the file to be deleted (either after closing it or when exiting the context manager block). When True, the value of delete_on_close is ignored.
  • delete_on_close will prevent the file to be deleted after being closed but will ensure the file is deleted as a end of a with block (when using it as a context manager).

Using delete_on_close to False, with a context manager, will provides assistance in automatic cleaning of the temporary file whereas, when using delete to False, you will have to clean up the file on your own.

By using delete to False
import os
import tempfile

temporary_file = tempfile.NamedTemporaryFile(delete=False)

print(temporary_file.name)
print(os.path.isfile(temporary_file.name))

# File is not deleted on close.
temporary_file.close()

print(os.path.isfile(temporary_file.name))

Where the output is:

/tmp/tmpq2wkbxbv
True
True

It is then you responsibility to manually remove the file as you won´t have any assistance in automatic cleaning.

By using delete_on_close to False
import os
import tempfile

with tempfile.NamedTemporaryFile(delete_on_close=False) as temporary_file:
    print(temporary_file.name)
    temporary_file.close()
    # File is not deleted on close.
    print(os.path.isfile(temporary_file.name))

print('--- Context manager exit ---')

# File is deleted at context manager exit.
print(os.path.isfile(temporary_file.name))

Where the output is:

/tmp/tmparbrvy0h
True
--- Context manager exit ---
False

This time, the file is not deleted when we close it (which allows us to reuse it later on) but assistance in automatic cleaning is provided by removing the file at context manager exit. This way of doing is safer than using delete and is recommended by the documentation.

The special case of Windows

Reopening a temporary file by its name requires additional conditions on Windows, as described in the NamedTemporaryFile documentation. To make it easier to work with temporary files on Windows, Django provides its own implementation of NamedTemporaryFile in the django.core.files.temp module. This implementation allows you to reopen a temporary file without having to handle all the conditions listed in the documentation yourself, providing an easier and more convenient solution for working with temporary files on Windows.

Creating temporary directories

Finally, the tempfile module exposes the TemporaryDirectory interface that allows you to create temporary directories. This time, the return value is a string that represents the path to the directory:

import os
import tempfile

with tempfile.TemporaryDirectory() as temporary_directory_path:
    print(f"Directory path: {temporary_directory_path}")
    print(f"Directory exists: {os.path.exists(temporary_directory_path)}")

    temporary_file_path = os.path.join(temporary_directory_path, 'file.txt')
    temporary_file = open(temporary_file_path, 'w')

    print(f"Temporary file exists: {os.path.exists(temporary_file_path)}")

print('--- Context manager exit ---')

print(f"Directory exists: {os.path.exists(temporary_directory_path)}")
print(f"Temporary file exists: {os.path.exists(temporary_file_path)}")

Where the output is:

Directory path: /tmp/tmpxgamn3z7
Directory exists: True
Temporary file exists: True
--- Context manager exit ---
Directory exists: False
Temporary file exists: False

We observe that the directory and the files it contains are removed at the exit of the context manager.

Controlling the automatic cleanup can be done by setting the delete parameter to False:

import os
import tempfile

# Create the temporary directory with `delete` to `False`.
with tempfile.TemporaryDirectory(delete=False) as temporary_directory_path:
    print(f"Directory path: {temporary_directory_path}")
    print(f"Directory exists: {os.path.exists(temporary_directory_path)}")

    temporary_file_path = os.path.join(temporary_directory_path, 'file.txt')
    temporary_file = open(temporary_file_path, 'w')

    print(f"Temporary file exists: {os.path.exists(temporary_file_path)}")

print('--- Context manager exit ---')

# Directory and files are not deleted on context manager exit.
print(f"Directory exists: {os.path.exists(temporary_directory_path)}")
print(f"Temporary file exists: {os.path.exists(temporary_file_path)}")
Directory path: /tmp/tmpd_n9eso5
Directory exists: True
Temporary file exists: True
--- Context manager exit ---
Directory exists: True
Temporary file exists: True

Setting delete to False disable the automatic cleaning assistance. Use it carefully as it is your responsibility to cleanup the directory on your own.

Conclusion

As we saw, safely creating, using and removing temporary files requires a bit of attention. By using the high level interfaces of the tempfile module correctly you mitigate risks by creating temporary files:

  • that have appropriate flags and permissions (read / write to the creator only)
  • in an appropriate folder depending on your system
  • that have unique and non predictable names
  • with assistance for automatic clean up

So the tempfile module should be used any time your program manipulate temporary files.

If you want to mitigate the risk to introduce manual processing of temporary files in your program, then you should consider using bandit. Bandit is a python linter that is specialized into finding common security issues in Python code. Its B108 check is very efficient to find insecure usage of temporary files and directory by looking for commonly used paths (/tmp, /var/tmp, etc …) in your code. You can use it with pre-commit and in your CI.