How To Setup Automated Backups For Plausible
Are you interested in setting up automated backups for plausible analytics? In this post, I will show you why and how to do it! The setup is based on a self-hosted docker version that I explained in an earlier post!
Basics
The foundation for this post is the Plausible setup in docker from this post here. In addition to the containers there, we need to create a companion container for the PostgreSQL database and configure the Clickhouse container to take care of the backups.
But why should you back up your statistics data? First of all, because data loss can happen at any time if by accident or someone wanting to delete your data. With the help of backups, you can recover from such a data loss, most of the time with a view states back in time.
In the next section, we will set up the companion container and configure the data back up in the way we want it.
Automated Backups for Plausible
Before we start setting up the databases and the backup of the data we will create the container configurations that are needed for plausible. Therefore we create a docker-compose.yml
file in the directory we like. Inside the file we will configure the containers plausible and plausible_mail:
services:
plausible_mail:
container_name: plausible_mail
image: bytemark/smtp
restart: unless-stopped
plausible:
container_name: plausible
image: plausible/analytics:latest
restart: unless-stopped
command: sh -c "sleep 10 && /entrypoint.sh db createdb && /entrypoint.sh db migrate && /entrypoint.sh db init-admin && /entrypoint.sh run"
ports:
- "8000:8000"
environment:
ADMIN_USER_EMAIL: mail@programonaut.com
ADMIN_USER_NAME: admin
ADMIN_USER_PWD: admin
BASE_URL: http://localhost:8000
SECRET_KEY_BASE: <key>
After that, we will first create the PostgreSQL database that stores the plausible configuration like users and the websites and the automated backing up of these. Therefore we will create the plausible_db container and the pg-backup container that is based on the postgres-backup-local
image which allows for automated and scheduled backups.
Need help or want to share feedback? Join my discord community!
plausible_db:
container_name: plausible_db
image: postgres:12
restart: always
volumes:
- ./plausible/data:/var/lib/postgresql/data
- ./plausible/backup:/var/lib/postgresql/backup
environment:
- POSTGRES_PASSWORD=postgres
pg-backup:
container_name: pg-backup
image: prodrigestivill/postgres-backup-local
restart: always
volumes:
- ./plausible/backup/postgres/:/backups/
links:
- plausible_db:plausible_db
depends_on:
- plausible_db
environment:
- POSTGRES_HOST=plausible_db
- POSTGRES_DB=plausible_db
- POSTGRES_USER=postgres
- POSTGRES_PASSWORD=postgres
- POSTGRES_EXTRA_OPTS=-Z9 --schema=public --blobs -a
- SCHEDULE=@daily
- BACKUP_KEEP_DAYS=14
- BACKUP_KEEP_WEEKS=4
- BACKUP_KEEP_MONTHS=6
- HEALTHCHECK_PORT=81
With the environment variables of the pg-backup container, you can configure the automated backups easily. For example, with the current configuration, I will create a backup once a day and store it for 14 days. In addition to the daily backups the image also automatically generates monthly and weekly backups. In addition to these variables, you can find all of them here.
Next up is the Clickhouse database. When setting up the system I figured out that using a companion container did not work very well, because of the way Clickhouse stores the data. The reason is that Clickhouse has some “shadow” instances containing the data that are not retrievable through the companion. Thus I decided to extend the base image via the click house server with some files and the backup functionality.
If this guide is helpful to you and you like what I do, please support me with a coffee!
For that, I first got myself the entrypoint.sh
script from the clickhouse server repository and added the following two lines before the last line in the script:
echo "Running crond"
crond -b -c /etc/crontabs
The final file looks like this:
#!/bin/bash
set -eo pipefail
shopt -s nullglob
DO_CHOWN=1
if [ "${CLICKHOUSE_DO_NOT_CHOWN:-0}" = "1" ]; then
DO_CHOWN=0
fi
CLICKHOUSE_UID="${CLICKHOUSE_UID:-"$(id -u clickhouse)"}"
CLICKHOUSE_GID="${CLICKHOUSE_GID:-"$(id -g clickhouse)"}"
# support --user
if [ "$(id -u)" = "0" ]; then
USER=$CLICKHOUSE_UID
GROUP=$CLICKHOUSE_GID
if command -v gosu &> /dev/null; then
gosu="gosu $USER:$GROUP"
elif command -v su-exec &> /dev/null; then
gosu="su-exec $USER:$GROUP"
else
echo "No gosu/su-exec detected!"
exit 1
fi
else
USER="$(id -u)"
GROUP="$(id -g)"
gosu=""
DO_CHOWN=0
fi
# set some vars
CLICKHOUSE_CONFIG="${CLICKHOUSE_CONFIG:-/etc/clickhouse-server/config.xml}"
if ! $gosu test -f "$CLICKHOUSE_CONFIG" -a -r "$CLICKHOUSE_CONFIG"; then
echo "Configuration file '$dir' isn't readable by user with id '$USER'"
exit 1
fi
# get CH directories locations
DATA_DIR="$(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key=path || true)"
TMP_DIR="$(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key=tmp_path || true)"
USER_PATH="$(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key=user_files_path || true)"
LOG_PATH="$(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key=logger.log || true)"
LOG_DIR=""
if [ -n "$LOG_PATH" ]; then LOG_DIR="$(dirname "$LOG_PATH")"; fi
ERROR_LOG_PATH="$(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key=logger.errorlog || true)"
ERROR_LOG_DIR=""
if [ -n "$ERROR_LOG_PATH" ]; then ERROR_LOG_DIR="$(dirname "$ERROR_LOG_PATH")"; fi
FORMAT_SCHEMA_PATH="$(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key=format_schema_path || true)"
CLICKHOUSE_USER="${CLICKHOUSE_USER:-default}"
CLICKHOUSE_PASSWORD="${CLICKHOUSE_PASSWORD:-}"
CLICKHOUSE_DB="${CLICKHOUSE_DB:-}"
CLICKHOUSE_ACCESS_MANAGEMENT="${CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT:-0}"
for dir in "$DATA_DIR" \
"$ERROR_LOG_DIR" \
"$LOG_DIR" \
"$TMP_DIR" \
"$USER_PATH" \
"$FORMAT_SCHEMA_PATH"
do
# check if variable not empty
[ -z "$dir" ] && continue
# ensure directories exist
if ! mkdir -p "$dir"; then
echo "Couldn't create necessary directory: $dir"
exit 1
fi
if [ "$DO_CHOWN" = "1" ]; then
# ensure proper directories permissions
# but skip it for if directory already has proper premissions, cause recursive chown may be slow
if [ "$(stat -c %u "$dir")" != "$USER" ] || [ "$(stat -c %g "$dir")" != "$GROUP" ]; then
chown -R "$USER:$GROUP" "$dir"
fi
elif ! $gosu test -d "$dir" -a -w "$dir" -a -r "$dir"; then
echo "Necessary directory '$dir' isn't accessible by user with id '$USER'"
exit 1
fi
done
# if clickhouse user is defined - create it (user "default" already exists out of box)
if [ -n "$CLICKHOUSE_USER" ] && [ "$CLICKHOUSE_USER" != "default" ] || [ -n "$CLICKHOUSE_PASSWORD" ]; then
echo "$0: create new user '$CLICKHOUSE_USER' instead 'default'"
cat <<EOT > /etc/clickhouse-server/users.d/default-user.xml
<yandex>
<!-- Docs: <https://clickhouse.tech/docs/en/operations/settings/settings_users/> -->
<users>
<!-- Remove default user -->
<default remove="remove">
</default>
<${CLICKHOUSE_USER}>
<profile>default</profile>
<networks>
<ip>::/0</ip>
</networks>
<password>${CLICKHOUSE_PASSWORD}</password>
<quota>default</quota>
<access_management>${CLICKHOUSE_ACCESS_MANAGEMENT}</access_management>
</${CLICKHOUSE_USER}>
</users>
</yandex>
EOT
fi
if [ -n "$(ls /docker-entrypoint-initdb.d/)" ] || [ -n "$CLICKHOUSE_DB" ]; then
# port is needed to check if clickhouse-server is ready for connections
HTTP_PORT="$(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key=http_port)"
# Listen only on localhost until the initialization is done
$gosu /usr/bin/clickhouse-server --config-file="$CLICKHOUSE_CONFIG" -- --listen_host=127.0.0.1 &
pid="$!"
# check if clickhouse is ready to accept connections
# will try to send ping clickhouse via http_port (max 12 retries by default, with 1 sec timeout and 1 sec delay between retries)
tries=${CLICKHOUSE_INIT_TIMEOUT:-12}
while ! wget --spider -T 1 -q "http://127.0.0.1:$HTTP_PORT/ping" 2>/dev/null; do
if [ "$tries" -le "0" ]; then
echo >&2 'ClickHouse init process failed.'
exit 1
fi
tries=$(( tries-1 ))
sleep 1
done
clickhouseclient=( clickhouse-client --multiquery --host "127.0.0.1" -u "$CLICKHOUSE_USER" --password "$CLICKHOUSE_PASSWORD" )
echo
# create default database, if defined
if [ -n "$CLICKHOUSE_DB" ]; then
echo "$0: create database '$CLICKHOUSE_DB'"
"${clickhouseclient[@]}" -q "CREATE DATABASE IF NOT EXISTS $CLICKHOUSE_DB";
fi
for f in /docker-entrypoint-initdb.d/*; do
case "$f" in
*.sh)
if [ -x "$f" ]; then
echo "$0: running $f"
"$f"
else
echo "$0: sourcing $f"
# shellcheck source=/dev/null
. "$f"
fi
;;
*.sql) echo "$0: running $f"; "${clickhouseclient[@]}" < "$f" ; echo ;;
*.sql.gz) echo "$0: running $f"; gunzip -c "$f" | "${clickhouseclient[@]}"; echo ;;
*) echo "$0: ignoring $f" ;;
esac
echo
done
if ! kill -s TERM "$pid" || ! wait "$pid"; then
echo >&2 'Finishing of ClickHouse init process failed.'
exit 1
fi
fi
# if no args passed to `docker run` or first argument start with `--`, then the user is passing clickhouse-server arguments
if [[ $# -lt 1 ]] || [[ "$1" == "--"* ]]; then
exec $gosu /usr/bin/clickhouse-server --config-file="$CLICKHOUSE_CONFIG" "$@"
fi
# Otherwise, we assume the user want to run his own process, for example a `bash` shell to explore this image
echo "Running crond"
crond -b -c /etc/crontabs
exec "$@"
This is required to that cronjobs are run. In addition to that, I created a backup.sh
script containing the following code:
#!/bin/bash
BACKUP_NAME=$BACKUP_PRE-$(date -u +%Y-%m-%dT%H-%M-%S)
clickhouse-backup create
if [[ $? != 0 ]]; then
echo "clickhouse-backup create $BACKUP_NAME FAILED and return $? exit code"
fi
And a crontab file called cron
that runs the backup.sh script every day at 1 am:
# min hour day month weekday command
0 1 * * * sh /var/lib/backup.sh
To incorporate all these changes I created a Dockerfile:
FROM yandex/clickhouse-server:21.3.20.1-alpine
# RUN apt-get update && apt-get install cron -y && apt-get install vim -y
RUN apk update && apk add --no-cache --update busybox-suid
RUN wget https://github.com/AlexAkulov/clickhouse-backup/releases/download/v1.5.2/clickhouse-backup-linux-amd64.tar.gz
RUN tar -xzf clickhouse-backup-linux-amd64.tar.gz
RUN cd build/linux/amd64/ && cp clickhouse-backup /bin/clickhouse-backup
RUN cd ~ && rm -rf clickhouse-backup-linux-amd64.tar.gz build
COPY ./cron /etc/crontabs/root
COPY ./backup.sh /var/lib/backup.sh
COPY ./entrypoint.sh /entrypoint.sh
!Disclaimer: All three scripts have to be executable, for example, modify them with chmod 777.
This Dockerfile is then the base for the Clickhouse database image. To build it we will add the following container configuration to our docker-compose.yml
:
plausible_events_db:
container_name: plausible_events_db
build: ./
image: clickhouse-server
restart: unless-stopped
volumes:
- ./plausible/event-data:/var/lib/clickhouse
environment:
BACKUPS_TO_KEEP_LOCAL: 14
BACKUP_PRE: plausible
ulimits:
nofile:
soft: 262144
hard: 262144
With this configuration of the cron
file and the environment variable BACKUPS_TO_KEEP_LOCAL
we create one backup a day and keep it for 14 days.
The backups for Postgres can be found in ./plausible/backup
and the backups for clickhouse can be found in ./plausible/event-data/backup
.
With this, we set up automated backups for plausible. Now let’s have a look at how to recover the data in case of a data loss!
Recover the data from Backups
In this section, we will have a look at how to recover the data of the two different databases.
PostgreSQL
docker stop plausible_db
docker rm plausible_db
- rename the old data folder
docker compose up -d plausible_db
docker restart plausible
- check in the browser if everything works as expected (no websites there)
docker exec -it plausible_db bash -c "zcat /var/lib/postgresql/backup/postgres/<backup-dir>/<backup-file>.sql.gz | psql --username=postgres --dbname=plausible_db -W"
Clickhouse
docker stop plausible_events_db
docker rm plausible_events_db
- move the backups directory out of the event-data folder
- rename the old event-data folder
docker compose up -d plausible_events_db
docker restart plausible
- check in the browser if everything works as expected (websites there, but no data)
- move the backup folder back into the new event-data folder
docker exec plausible_events_db bash -c "clickhouse-backup restore <backup-dir-name>"
With this, you can recover data after a data loss!
Conclusion
In this post, we created automated backups for plausible by creating a companion container and by updating the clickhouse instance required for plausible. We learned how to set up the backup creation and we also learned how to recover the data in case of a data loss!
I hope this post helped you set up automated backups and that keeps you safe from the struggle that I had when losing my data.
In case you liked this post consider subscribing to my newsletter to get monthly updates on all of my posts!
[convertkit form=2303042]