Administration - Paperless-ngx (2024)

Making backups

Multiple options exist for making backups of your paperless instance,depending on how you installed paperless.

Before making a backup, it's probably best to make sure that paperless is not activelyconsuming documents at that time.

Options available to any installation of paperless:

Use the document exporter. The document exporter exports all your documents,thumbnails, metadata, and database contents to a specific folder. You may import yourdocuments and settings into a fresh instance of paperless again or store yourdocuments in another DMS with this export.
The document exporter is also able to update an already existingexport. Therefore, incremental backups with rsync are entirelypossible.

Caution

You cannot import the export generated with one version of paperless ina different version of paperless. The export contains an exact image ofthe database, and migrations may change the database layout.

Options available to docker installations:

Backup the docker volumes. These usually reside within/var/lib/docker/volumes on the host and you need to be root inorder to access them.
Paperless uses 4 volumes:
- paperless_media: This is where your documents are stored.
- paperless_data: This is where auxiliary data is stored. Thisfolder also contains the SQLite database, if you use it.
- paperless_pgdata: Exists only if you use PostgreSQL andcontains the database.
- paperless_dbdata: Exists only if you use MariaDB and containsthe database.

Options available to bare-metal and non-docker installations:

Backup the entire paperless folder. This ensures that if yourpaperless instance crashes at some point or your disk fails, you cansimply copy the folder back into place and it works.
When using PostgreSQL or MariaDB, you'll also have to backup thedatabase.

Restoring

If you've backed-up Paperless-ngx using the document exporter,restoring can simply be done with the document importer.

Of course, other backup strategies require restoring any volumes, folders and databasecopies you created in the steps above.

Updating Paperless

Docker Route

If a new release of paperless-ngx is available, upgrading depends on howyou installed paperless-ngx in the first place. The releases areavailable at the releasepage.

First of all, make sure no active processes (like consumption) are running, then make a backup.

After that, ensure that paperless is stopped:

$ cd /path/to/paperless$ docker compose down

If you pull the image from the docker hub, all you need to do is:
```
$ docker compose pull$ docker compose up
```
The Docker Compose files refer to the latest version, which isalways the latest stable release.

If you built the image yourself, do the following:

$ git pull$ docker compose build$ docker compose up

Running docker compose up will also apply any new database migrations.If you see everything working, press CTRL+C once to gracefully stoppaperless. Then you can start paperless-ngx with -d to have it run inthe background.

Note

In version 0.9.14, the update process was changed. In 0.9.13 andearlier, the Docker Compose files specified exact versions and pullwon't automatically update to newer versions. In order to enableupdates as described above, either get the new docker-compose.ymlfile fromhereor edit the docker-compose.yml file, find the line that says

image: ghcr.io/paperless-ngx/paperless-ngx:0.9.x

and replace the version with latest:

image: ghcr.io/paperless-ngx/paperless-ngx:latest

Note

In version 1.7.1 and onwards, the Docker image can now be pinned to arelease series. This is often combined with automatic updaters such asWatchtower to allow safer unattended upgrading to new bugfix releasesonly. It is still recommended to always review release notes beforeupgrading. To pin your install to a release series, edit thedocker-compose.yml find the line that says

image: ghcr.io/paperless-ngx/paperless-ngx:latest

and replace the version with the series you want to track, forexample:

image: ghcr.io/paperless-ngx/paperless-ngx:1.7

Bare Metal Route

After grabbing the new release and unpacking the contents, do thefollowing:

Database Upgrades

In general, paperless does not require a specific version of PostgreSQL or MariaDB and it issafe to update them to newer versions. However, you should always take a backup and followthe instructions from your database's documentation for how to upgrade between major versions.

For PostgreSQL, refer to Upgrading a PostgreSQL Cluster.

For MariaDB, refer to Upgrading MariaDB

You may also use the exporter and importer with the --data-only flag, after creating a new database with the updated version of PostgreSQL or MariaDB.

Warning

You should not change any settings, especially paths, when doing this or there is arisk of data loss

Management utilities

Paperless comes with some management commands that perform variousmaintenance tasks on your paperless instance. You can invoke thesecommands in the following way:

With Docker Compose, while paperless is running:

$ cd /path/to/paperless$ docker compose exec webserver <command> <arguments>

With docker, while paperless is running:

$ docker exec -it <container-name> <command> <arguments>

Bare metal:

$ cd /path/to/paperless/src$ python3 manage.py <command> <arguments> # (1)

Including sudo -Hu <paperless_user> may be required

All commands have built-in help, which can be accessed by executing themwith the argument --help.

Document exporter

The document exporter exports all your data (including your settingsand database contents) from paperless into a folder for backup ormigration to another DMS.

If you use the document exporter within a cronjob to backup your datayou might use the -T flag behind exec to suppress "The input deviceis not a TTY" errors. For example:docker compose exec -T webserver document_exporter ../export

document_exporter target [-c] [-d] [-f] [-na] [-nt] [-p] [-sm] [-z]optional arguments:-c, --compare-checksums-d, --delete-f, --use-filename-format-na, --no-archive-nt, --no-thumbnail-p, --use-folder-prefix-sm, --split-manifest-z, --zip-zn, --zip-name--data-only

target is a folder to which the data gets written. This includesdocuments, thumbnails and a manifest.json file. The manifest containsall metadata from the database (correspondents, tags, etc).

When you use the provided docker compose script, specify ../export asthe target. This path inside the container is automatically mounted onyour host on the folder export.

If the target directory already exists and contains files, paperlesswill assume that the contents of the export directory are a previousexport and will attempt to update the previous export. Paperless willonly export changed and added files. Paperless determines whether a filehas changed by inspecting the file attributes "date/time modified" and"size". If that does not work out for you, specify -c or--compare-checksums and paperless will attempt to compare filechecksums instead. This is slower.

Paperless will not remove any existing files in the export directory. Ifyou want paperless to also remove files that do not belong to thecurrent export such as files from deleted documents, specify -d or --delete.Be careful when pointing paperless to a directory that already containsother files.

The filenames generated by this command follow the format[date created] [correspondent] [title].[extension]. If you wantpaperless to use PAPERLESS_FILENAME_FORMAT for exported filenamesinstead, specify -f or --use-filename-format.

If -na or --no-archive is provided, no archive files will be exported,only the original files.

If -nt or --no-thumbnail is provided, thumbnail files will not be exported.

Note

When using the -na/--no-archive or -nt/--no-thumbnail optionsthe exporter will not output these files for backup. After importing,the sanity checker will warn about missing thumbnails and archive filesuntil they are regenerated with document_thumbnails or document_archiver.It can make sense to omit these files from backup as their content and checksumcan change (new archiver algorithm) and may then cause additional used space ina deduplicated backup.

If -p or --use-folder-prefix is provided, files will be exportedin dedicated folders according to their nature: archive, originals,thumbnails or json

If -sm or --split-manifest is provided, information about documentwill be placed in individual json files, instead of a single JSON file. The mainmanifest.json will still contain application wide information (e.g. tags, correspondent,documenttype, etc)

Document importer

The document importer takes the export produced by the Documentexporter and imports it into paperless.

The importer works just like the exporter. You point it at a directory,and the script does the rest of the work:

document_importer source

Option	Required	Default	Description
source	Yes	N/A	The directory containing an export
--data-only	No	False	If provided, only import data, do not import document files or thumbnails

When you use the provided docker compose script, put the export insidethe export folder in your paperless source directory. Specify../export as the source.

Note that .zip files (as can be generated from the exporter) are not supported.

Note

Importing from a previous version of Paperless may work, but for bestresults it is suggested to match the versions.

Warning

The importer should be run against a completely empty installation (database and directories) of Paperless-ngx.

Document retagger

Say you've imported a few hundred documents and now want to introduce atag or set up a new correspondent, and apply its matching to all of thecurrently-imported docs. This problem is common enough that there aretools for it.

document_retagger [-h] [-c] [-T] [-t] [-i] [--id-range] [--use-first] [-f]optional arguments:-c, --correspondent-T, --tags-t, --document_type-s, --storage_path-i, --inbox-only--id-range--use-first-f, --overwrite

Run this after changing or adding matching rules. It'll loop over allof the documents in your database and attempt to match documentsaccording to the new rules.

Specify any combination of -c, -T, -t and -s to have theretagger perform matching of the specified metadata type. If you don'tspecify any of these options, the document retagger won't do anything.

Specify -i to have the document retagger work on documents tagged withinbox tags only. This is useful when you don't want to mess with youralready processed documents.

Specify --id-range 1 100 to have the document retagger work only on aspecific range of document id´s. This can be useful if you have a lot ofdocuments and want to test the matching rules only on a subset ofdocuments.

When multiple document types or correspondents match a single document,the retagger won't assign these to the document. Specify --use-firstto override this behavior and just use the first correspondent or typeit finds. This option does not apply to tags, since any amount of tagscan be applied to a document.

Finally, -f specifies that you wish to overwrite already assignedcorrespondents, types and/or tags. The default behavior is to not assigncorrespondents and types to documents that have this data alreadyassigned. -f works differently for tags: By default, only additionaltags get added to documents, no tags will be removed. With -f, tagsthat don't match a document anymore get removed as well.

Managing the Automatic matching algorithm

The Auto matching algorithm requires a trained neural network to work.This network needs to be updated whenever something in your datachanges. The docker image takes care of that automatically with the taskscheduler. You can manually renew the classifier by invoking thefollowing management command:

document_create_classifier

This command takes no arguments.

Document thumbnails

Use this command to re-create document thumbnails. Optionally include the --document {id} option to generate thumbnails for a specific document only.

You may also specify --processes to control the number of processes used to generate new thumbnails. The default is to utilizea quarter of the available processors.

document_thumbnails

Managing the document search index

The document search index is responsible for delivering search resultsfor the website. The document index is automatically updated wheneverdocuments get added to, changed, or removed from paperless. However, ifthe search yields non-existing documents or won't find anything, youmay need to recreate the index manually.

document_index {reindex,optimize}

Specify reindex to have the index created from scratch. This may takesome time.

Specify optimize to optimize the index. This updates certain aspectsof the index and usually makes queries faster and also ensures that theautocompletion works properly. This command is regularly invoked by thetask scheduler.

Managing filenames

If you use paperless' feature toassign custom filenames to your documents, you can use this command to move all your files afterchanging the naming scheme.

Warning

Since this command moves your documents, it is advised to do a backupbeforehand. The renaming logic is robust and will never overwrite ordelete a file, but you can't ever be careful enough.

document_renamer

The command takes no arguments and processes all your documents at once.

Learn how to useManagement Utilities.

Sanity checker

Paperless has a built-in sanity checker that inspects your documentcollection for issues.

The issues detected by the sanity checker are as follows:

Missing original files.
Missing archive files.
Inaccessible original files due to improper permissions.
Inaccessible archive files due to improper permissions.
Corrupted original documents by comparing their checksum againstwhat is stored in the database.
Corrupted archive documents by comparing their checksum against whatis stored in the database.
Missing thumbnails.
Inaccessible thumbnails due to improper permissions.
Documents without any content (warning).
Orphaned files in the media directory (warning). These are filesthat are not referenced by any document in paperless.

document_sanity_checker

The command takes no arguments. Depending on the size of your documentarchive, this may take some time.

Fetching e-mail

Paperless automatically fetches your e-mail every 10 minutes by default.If you want to invoke the email consumer manually, call the followingmanagement command:

mail_fetcher

The command takes no arguments and processes all your mail accounts andrules.

Tip

To use OAuth access tokens for mail fetching,select the box to indicate the password is actuallya token when creating or editing a mail account. Thedetails for creating a token depend on your emailprovider.

Creating archived documents

Paperless stores archived PDF/A documents alongside your originaldocuments. These archived documents will also contain selectable textfor image-only originals. These documents are derived from theoriginals, which are always stored unmodified. If coming from an earlierversion of paperless, your documents won't have archived versions.

This command creates PDF/A documents for your documents.

document_archiver --overwrite --document <id>

This command will only attempt to create archived documents when noarchived document exists yet, unless --overwrite is specified. If--document <id> is specified, the archiver will only process thatdocument.

Note

This command essentially performs OCR on all your documents again,according to your settings. If you run this withPAPERLESS_OCR_MODE=redo, it will potentially run for a very long time.You can cancel the command at any time, since this command will skipalready archived versions the next time it is run.

Note

Some documents will cause errors and cannot be converted into PDF/Adocuments, such as encrypted PDF documents. The archiver will skip overthese documents each time it sees them.

Managing encryption

Documents can be stored in Paperless using GnuPG encryption.

Warning

Encryption is deprecated since paperless-ng 0.9 and doesn't reallyprovide any additional security, since you have to store the passphrasein a configuration file on the same system as the encrypted documentsfor paperless to work. Furthermore, the entire text content of thedocuments is stored plain in the database, even if your documents areencrypted. Filenames are not encrypted as well.

Also, the web server provides transparent access to your encrypteddocuments.

Consider running paperless on an encrypted filesystem instead, whichwill then at least provide security against physical hardware theft.

Enabling encryption

Enabling encryption is no longer supported.

Disabling encryption

Basic usage to disable encryption of your document store:

(Note: If PAPERLESS_PASSPHRASE isn't set already, you need to specifyit here)

decrypt_documents [--passphrase SECR3TP4SSPHRA$E]

Detecting duplicates

Paperless already catches and prevents upload of exactly matching documents,however a new scan of an existing document may not produce an exact bit for bitduplicate. But the content should be exact or close, allowing detection.

This tool does a fuzzy match over document content, looking forthose which look close according to a given ratio.

At this time, other metadata (such as correspondent or type) is nottaken into account by the detection.

document_fuzzy_match [--ratio] [--processes N]

Option	Required	Default	Description
--ratio	No	85.0	a number between 0 and 100, setting how similar a document must be for it to be reported. Higher numbers mean more similarity.
--processes	No	1/4 of system cores	Number of processes to use for matching. Setting 1 disables multiple processes
--delete	No	False	If provided, one document of a matched pair above the ratio will be deleted.

Warning

If providing the --delete option, it is highly recommended to have a backup.While every effort has been taken to ensure proper operation, there is always thechance of deletion of a file you want to keep.