Auditing deletes with Ecto
17 Mar 2025
If you work in web applications you’ll most likely have implemented soft deletes. Soft deletes mean that the data is not actually deleted from the database, it is just hidden.
The benefit of this approach is obvious: since information is not lost, you can always get it back if you need it later. Unfortunately this comes with a bunch of downsides that are not evident at the beginning, but become so over time.
I could get in detail about those, but Brandur summarizes it better than I could in Soft deletion probably isn’t worth it:
[…] soft deletion logic bleeds out into all parts of your code.
[…] foreign keys are effectively lost. […] A customer may be soft deleted with its
deleted_at
flag set, but we’re now back to being able to forget to do the same for its invoices. […] so you can be left with your customer being “deleted”, but its invoices still live.Pruning data is hard. […] a record can’t be removed without also making sure that all its dependencies are removed as well.
This matches my own experience very closely. Soft deletes expand across your codebase very quickly and is very easy to forget filtering those soft-deleted records (although some ORMs help with this). I also want to add a couple points to Brandur’s list:
- Performance. When the application grows you will have more and mored soft deleted content. This content still lives in the same tables as the non deleted content. It makes dumps bigger and slower. Indexes are heavier and affect query performance.
- Usefulness. In my experience having to recover deleted data is very rare and, if it has been soft-deleted for some time, it may not be even possible (newer things may already exist with the same name, or the may reference things that have been deleted for real).
Having a nice UX that allows users to undo deletions is a much better solution.
Thankfully Brandur himself proposes an alternative to soft deletion which he calls deleted_record_insert:
[…] a separate schemaless table that gets a full dump of deleted data, but which doesn’t interfere with mainline code (no need to include a deleted_at IS NULL predicate in every live query, no foreign key problems), and without the expectation that it’ll be used to undelete data
A few days ago Dan Schultzer wrote a great article that shows how to do this with Ecto.
His approach sets up the database triggers that Brandur proposed to record deletions into their own audit table. He then proceeds to write an Elixir function that sets the required context for the deletion. This allows us to record extra metadata such as who is deleting the data and why.
I really think that this is the most balanced solution for deleting data. You can still keep track of it (and maybe even recover it) but it is separate from your actual data, so you don’t need to remember setting the deleted_at is null
condition on every query.
Also, since the deleted data is backed up in a separate table, you can easily write recurrent tasks that clean it up after some time.