Add support for rust-synapse-compress-state

development
Slavi Pantaleev 4 years ago
parent 073c96a3fd
commit daf13107a0

@ -1,3 +1,12 @@
# 2020-08-21
## rust-synapse-compress-state support
The playbook can now help you use [rust-synapse-compress-state](https://github.com/matrix-org/rust-synapse-compress-state) to compress the state groups in your Synapse database.
See our [Compressing state with rust-synapse-compress-state](docs/maintenance-synapse.md#compressing-state-with-rust-synapse-compress-state) documentation page to get started.
# 2020-07-22 # 2020-07-22
## Synapse Admin support ## Synapse Admin support

@ -22,6 +22,8 @@ If you are using an [external Postgres server](configuring-playbook-external-pos
## Vacuuming PostgreSQL ## Vacuuming PostgreSQL
Deleting lots data from Postgres does not make it release disk space, until you perform a `VACUUM` operation.
To perform a `FULL` Postgres [VACUUM](https://www.postgresql.org/docs/current/sql-vacuum.html), run the playbook with `--tags=run-postgres-vacuum`. To perform a `FULL` Postgres [VACUUM](https://www.postgresql.org/docs/current/sql-vacuum.html), run the playbook with `--tags=run-postgres-vacuum`.
Example: Example:
@ -42,7 +44,7 @@ docker run \
--rm \ --rm \
--network=matrix \ --network=matrix \
--env-file=/matrix/postgres/env-postgres-psql \ --env-file=/matrix/postgres/env-postgres-psql \
postgres:12.1-alpine \ postgres:12.4-alpine \
pg_dumpall -h matrix-postgres \ pg_dumpall -h matrix-postgres \
| gzip -c \ | gzip -c \
> /postgres.sql.gz > /postgres.sql.gz

@ -9,75 +9,74 @@ Table of contents:
- [Purging old data with the Purge History API](#purging-old-data-with-the-purge-history-api), for when you wish to delete in-use (but old) data from the Synapse database - [Purging old data with the Purge History API](#purging-old-data-with-the-purge-history-api), for when you wish to delete in-use (but old) data from the Synapse database
- [Synapse maintenance](#synapse-maintenance) - [Synapse maintenance](#synapse-maintenance)
- [Purging unused data with synapse-janitor](#purging-unused-data-with-synapse-janitor)
- [Vacuuming Postgres](#vacuuming-postgres)
- [Purging old data with the Purge History API](#purging-old-data-with-the-purge-history-api) - [Purging old data with the Purge History API](#purging-old-data-with-the-purge-history-api)
- [Compressing state with rust-synapse-compress-state](#compressing-state-with-rust-synapse-compress-state) - [Compressing state with rust-synapse-compress-state](#compressing-state-with-rust-synapse-compress-state)
- [Purging unused data with synapse-janitor](#purging-unused-data-with-synapse-janitor)
- [Browse and manipulate the database](#browse-and-manipulate-the-database) - [Browse and manipulate the database](#browse-and-manipulate-the-database)
- [Browse and manipulate the database](#browse-and-manipulate-the-database), for when you really need to take matters into your own hands - [Browse and manipulate the database](#browse-and-manipulate-the-database), for when you really need to take matters into your own hands
## Purging unused data with synapse-janitor
**NOTE**: There are [reports](https://github.com/spantaleev/matrix-docker-ansible-deploy/issues/465) that **synapse-janitor is dangerous to use and causes database corruption**. You may wish to refrain from using it. ## Purging old data with the Purge History API
When you **leave** and **forget** a room, Synapse can clean up its data, but currently doesn't. You can use the **Purge History API** to delete in-use (but old) data.
This **unused and unreachable data** remains in your database forever.
There are external tools (like [synapse-janitor](https://github.com/xwiki-labs/synapse_scripts)), which are meant to solve this problem. **This is destructive** (especially for non-federated rooms), because it means **people will no longer have access to history past a certain point**.
To ask the playbook to run synapse-janitor, execute: Synapse's [Purge History API](https://github.com/matrix-org/synapse/blob/master/docs/admin_api/purge_history_api.rst) can be used to purge on a per-room basis.
```bash To make use of this API, **you'll need an admin access token** first. You can find your access token in the setting of some clients (like Element).
ansible-playbook -i inventory/hosts setup.yml --tags=run-postgres-synapse-janitor,start Alternatively, you can log in and obtain a new access token like this:
```
**Note**: this will automatically stop Synapse temporarily and restart it later. ```
curl \
--data '{"identifier": {"type": "m.id.user", "user": "YOUR_MATRIX_USERNAME" }, "password": "YOUR_MATRIX_PASSWORD", "type": "m.login.password", "device_id": "Synapse-Purge-History-API"}' \
https://matrix.DOMAIN/_matrix/client/r0/login
```
Follow the [Purge History API](https://github.com/matrix-org/synapse/blob/master/docs/admin_api/purge_history_api.rst) documentation page for the actual purging instructions.
### Vacuuming Postgres After deleting data, you may wish to run a [`FULL` Postgres `VACUUM`](./maintenance-postgres.md#vacuuming-postgresql).
Running synapse-janitor potentially deletes a lot of data from the Postgres database.
However, disk space only ever gets released after a [`FULL` Postgres `VACUUM`](./maintenance-postgres.md#vacuuming-postgresql).
It's easiest if you ask the playbook to run both synapse-janitor and a `VACUUM FULL` in one call: ## Compressing state with rust-synapse-compress-state
```bash [rust-synapse-compress-state](https://github.com/matrix-org/rust-synapse-compress-state) can be used to optimize some `_state` tables used by Synapse.
ansible-playbook -i inventory/hosts setup.yml --tags=run-postgres-synapse-janitor,run-postgres-vacuum,start
```
**Note**: this will automatically stop Synapse temporarily and restart it later. You'll also need plenty of available disk space in your Postgres data directory (usually `/matrix/postgres/data`). This tool should be safe to use (even when Synapse is running), but it's always a good idea to [make Postgres backups](./maintenance-postgres.md#backing-up-postgresql) first.
To ask the playbook to run rust-synapse-compress-state, execute:
## Purging old data with the Purge History API ```
ansible-playbook -i inventory/hosts setup.yml --tags=rust-synapse-compress-state
```
If [purging unused and unreachable data](#purging-unused-data-with-synapse-janitor) is not enough for you, you can start deleting in-use (but old) data. By default, all rooms with more than `100000` state group rows will be compressed.
If you need to adjust this, pass: `--extra-vars='matrix_synapse_rust_synapse_compress_state_min_state_groups_required=SOME_NUMBER_HERE'` to the command above.
**This is destructive** (especially for non-federated rooms), because it means **people will no longer have access to history past a certain point**. After state compression, you may wish to run a [`FULL` Postgres `VACUUM`](./maintenance-postgres.md#vacuuming-postgresql).
Synapse provides a [Purge History API](https://github.com/matrix-org/synapse/blob/master/docs/admin_api/purge_history_api.rst) that you can use to purge on a per-room basis.
To make use of this API, **you'll need an admin access token** first. You can find your access token in the setting of some clients (like Element). ## Purging unused data with synapse-janitor
Alternatively, you can log in and obtain a new access token like this:
``` **NOTE**: There are [reports](https://github.com/spantaleev/matrix-docker-ansible-deploy/issues/465) that **synapse-janitor is dangerous to use and causes database corruption**. You may wish to refrain from using it.
curl \
--data '{"identifier": {"type": "m.id.user", "user": "YOUR_MATRIX_USERNAME" }, "password": "YOUR_MATRIX_PASSWORD", "type": "m.login.password", "device_id": "Synapse-Purge-History-API"}' \
https://matrix.DOMAIN/_matrix/client/r0/login
```
Follow the [Purge History API](https://github.com/matrix-org/synapse/blob/master/docs/admin_api/purge_history_api.rst) documentation page for the actual purging instructions. When you **leave** and **forget** a room, Synapse can clean up its data, but currently doesn't.
This **unused and unreachable data** remains in your database forever.
Don't forget that disk space only ever gets released after a [`FULL` Postgres `VACUUM`](./maintenance-postgres.md#vacuuming-postgresql) - something the playbook can help you with. There are external tools (like [synapse-janitor](https://github.com/xwiki-labs/synapse_scripts)), which are meant to solve this problem.
To ask the playbook to run synapse-janitor, execute:
## Compressing state with rust-synapse-compress-state ```bash
ansible-playbook -i inventory/hosts setup.yml --tags=run-postgres-synapse-janitor,start
```
[rust-synapse-compress-state](https://github.com/matrix-org/rust-synapse-compress-state) can be used to optimize some `_state` tables used by Synapse. **Note**: this will automatically stop Synapse temporarily and restart it later.
Unfortunately, at this time the playbook can't help you run this **experimental tool**. Running synapse-janitor potentially deletes a lot of data from the Postgres database.
You may wish to run a [`FULL` Postgres `VACUUM`](./maintenance-postgres.md#vacuuming-postgresql) after that.
Since it's also experimental, you may wish to stay away from it, or at least [make Postgres backups](./maintenance-postgres.md#backing-up-postgresql) first.
## Browse and manipulate the database ## Browse and manipulate the database

@ -101,6 +101,7 @@ run_postgres_vacuum: true
run_synapse_register_user: true run_synapse_register_user: true
run_synapse_update_user_password: true run_synapse_update_user_password: true
run_synapse_import_media_store: true run_synapse_import_media_store: true
run_synapse_rust_synapse_compress_state: true
run_setup: true run_setup: true
run_self_check: true run_self_check: true
run_start: true run_start: true

@ -364,6 +364,13 @@ matrix_synapse_redaction_retention_period: 7d
matrix_synapse_user_ips_max_age: 28d matrix_synapse_user_ips_max_age: 28d
matrix_synapse_rust_synapse_compress_state_docker_image: "devture/rust-synapse-compress-state:v0.1.0"
matrix_synapse_rust_synapse_compress_state_docker_image_force_pull: "{{ matrix_synapse_rust_synapse_compress_state_docker_image.endswith(':latest') }}"
matrix_synapse_rust_synapse_compress_state_base_path: "{{ matrix_base_data_path }}/rust-synapse-compress-state"
# Default Synapse configuration template which covers the generic use case. # Default Synapse configuration template which covers the generic use case.
# You can customize it by controlling the various variables inside it. # You can customize it by controlling the various variables inside it.
# #

@ -43,6 +43,11 @@
tags: tags:
- update-user-password - update-user-password
- import_tasks: "{{ role_path }}/tasks/rust-synapse-compress-state/main.yml"
when: run_synapse_rust_synapse_compress_state|bool
tags:
- rust-synapse-compress-state
- name: Mark matrix-synapse role as executed - name: Mark matrix-synapse role as executed
set_fact: set_fact:
matrix_synapse_role_executed: true matrix_synapse_role_executed: true

@ -0,0 +1,48 @@
- debug:
msg: "Compressing room `{{ room_details.room_id }}` having {{ room_details.count }} state group rows"
- name: Generate rust-synapse-compress-state room compression command
set_fact:
matrix_synapse_rust_synapse_compress_state_compress_room_command: >-
{{ matrix_host_command_docker }} run --rm --name matrix-rust-synapse-compress-state-compress-room
--user={{ matrix_user_uid }}:{{ matrix_user_gid }}
--cap-drop=ALL
--network={{ matrix_docker_network }}
-v {{ matrix_synapse_rust_synapse_compress_state_base_path }}:/work
{{ matrix_synapse_rust_synapse_compress_state_docker_image }}
/synapse-compress-state -t -o /work/state-compressor.sql
-p "host={{ matrix_synapse_database_host }} user={{ matrix_synapse_database_user }} password={{ matrix_synapse_database_password }} dbname={{ matrix_synapse_database_database }}"
-r '{{ room_details.room_id }}'
- name: Run rust-synapse-compress-state room compression command (SQL generation)
command: "{{ matrix_synapse_rust_synapse_compress_state_compress_room_command }}"
async: "{{ matrix_synapse_rust_synapse_compress_state_compress_room_time }}"
poll: 10
register: matrix_synapse_rust_synapse_compress_state_compress_room_command_result
- debug: var="matrix_synapse_rust_synapse_compress_state_compress_room_command_result"
- name: Generate Postgres compression SQL import command
set_fact:
matrix_synapse_rust_synapse_compress_state_psql_import_command: >-
{{ matrix_host_command_docker }} run --rm --name matrix-rust-synapse-compress-state-psql-import
--user={{ matrix_user_uid }}:{{ matrix_user_gid }}
--cap-drop=ALL
--network={{ matrix_docker_network }}
--env-file={{ matrix_postgres_base_path }}/env-postgres-psql
-v {{ matrix_synapse_rust_synapse_compress_state_base_path }}:/work:ro
--entrypoint=/bin/sh
{{ matrix_postgres_docker_image_latest }}
-c "cat /work/state-compressor.sql |
psql -v ON_ERROR_STOP=1 -h matrix-postgres"
- name: Import compression SQL into Postgres
command: "{{ matrix_synapse_rust_synapse_compress_state_psql_import_command }}"
async: "{{ matrix_synapse_rust_synapse_compress_state_psql_import_time }}"
poll: 10
register: matrix_synapse_rust_synapse_compress_state_psql_import_command_result
- name: Clean up
file:
path: "{{ matrix_synapse_rust_synapse_compress_state_base_path }}/state-compressor.sql"
state: absent

@ -0,0 +1,118 @@
# Pre-checks
- name: Fail if Postgres not enabled
fail:
msg: "Postgres via the matrix-postgres role is not enabled (`matrix_postgres_enabled`). Cannot use rust-synapse-compress-state."
when: "not matrix_postgres_enabled|bool"
# Defaults
- name: Set matrix_synapse_rust_synapse_compress_state_find_rooms_command_wait_time, if not provided
set_fact:
matrix_synapse_rust_synapse_compress_state_find_rooms_command_wait_time: 15
when: "matrix_synapse_rust_synapse_compress_state_find_rooms_command_wait_time|default('') == ''"
- name: Set matrix_synapse_rust_synapse_compress_state_compress_room_time, if not provided
set_fact:
matrix_synapse_rust_synapse_compress_state_compress_room_time: 1800
when: "matrix_synapse_rust_synapse_compress_state_compress_room_time|default('') == ''"
- name: Set matrix_synapse_rust_synapse_compress_state_psql_import_time, if not provided
set_fact:
matrix_synapse_rust_synapse_compress_state_psql_import_time: 1800
when: "matrix_synapse_rust_synapse_compress_state_psql_import_time|default('') == ''"
- name: Set matrix_synapse_rust_synapse_compress_state_min_state_groups_required, if not provided
set_fact:
# The minimum number of state groups we're looking for before we consider a room eligible for compression.
# Rooms with a smaller state groups count will not be compressed.
matrix_synapse_rust_synapse_compress_state_min_state_groups_required: 100000
when: "matrix_synapse_rust_synapse_compress_state_min_state_groups_required|default('') == ''"
# Actual compression work
- name: Ensure rust-synapse-compress-state paths exist
file:
path: "{{ matrix_synapse_rust_synapse_compress_state_base_path }}"
state: directory
mode: 0750
owner: "{{ matrix_user_username }}"
group: "{{ matrix_user_groupname }}"
- name: Ensure rust-synapse-compress-state image is pulled
docker_image:
name: "{{ matrix_synapse_rust_synapse_compress_state_docker_image }}"
source: "{{ 'pull' if ansible_version.major > 2 or ansible_version.minor > 7 else omit }}"
force_source: "{{ matrix_synapse_rust_synapse_compress_state_docker_image_force_pull if ansible_version.major > 2 or ansible_version.minor >= 8 else omit }}"
force: "{{ omit if ansible_version.major > 2 or ansible_version.minor >= 8 else matrix_synapse_rust_synapse_compress_state_docker_image_force_pull }}"
- name: Generate rust-synapse-compress-state room find command
set_fact:
matrix_synapse_rust_synapse_compress_state_find_rooms_command: >-
{{ matrix_host_command_docker }} run --rm --name matrix-rust-synapse-compress-state-find-rooms
--user={{ matrix_user_uid }}:{{ matrix_user_gid }}
--cap-drop=ALL
--network={{ matrix_docker_network }}
--env-file={{ matrix_postgres_base_path }}/env-postgres-psql
{{ matrix_postgres_docker_image_latest }}
psql -v ON_ERROR_STOP=1 -h matrix-postgres {{ matrix_synapse_database_database }} -c
'SELECT array_to_json(array_agg(row_to_json (r))) FROM (SELECT room_id, count(*) AS count FROM state_groups_state GROUP BY room_id HAVING count(*) > {{ matrix_synapse_rust_synapse_compress_state_min_state_groups_required }} ORDER BY count DESC) r;'
- name: Find rooms eligible for compression with rust-synapse-compress-state
command: "{{ matrix_synapse_rust_synapse_compress_state_find_rooms_command }}"
async: "{{ matrix_synapse_rust_synapse_compress_state_find_rooms_command_wait_time }}"
poll: 10
register: matrix_synapse_rust_synapse_compress_state_find_rooms_command_result
# We expect the output to be like this:
#
# "stdout_lines": [
# " array_to_json ",
# "----------------------------------------------------------------------------------------------------------------------------",
# " [{\"room_id\":\"!some-id\",\"count\":2461329},{\"room_id\":\"!another-id\",\"count\":512017}]",
# "(1 row)"
# ]
#
# Row 3 (out of 4) contains the actual result.
#
# Row 3 contains a space when there's no result.
- block:
- debug: var="matrix_synapse_rust_synapse_compress_state_find_rooms_command_result"
- name: Fail if room find result is not what we expect
fail:
msg: >-
Expecting 4 lines in the "find rooms" result.
when: "matrix_synapse_rust_synapse_compress_state_find_rooms_command_result.failed or matrix_synapse_rust_synapse_compress_state_find_rooms_command_result.stdout_lines|length != 4"
- block:
# matrix_synapse_rust_synapse_compress_state_eligible_rooms is a list
# of dictionaries like this: {'room_id': '!some-id', 'count': 2461329}
- set_fact:
matrix_synapse_rust_synapse_compress_state_eligible_rooms: "{{ matrix_synapse_rust_synapse_compress_state_find_rooms_command_result.stdout_lines[2] | from_json }}"
- name: Display rooms that will be compressed
debug:
msg: >-
The following rooms contain more than {{ matrix_synapse_rust_synapse_compress_state_min_state_groups_required }} state group rows
(configurable via `matrix_synapse_rust_synapse_compress_state_min_state_groups_required`)
and will be compressed:
{{ matrix_synapse_rust_synapse_compress_state_eligible_rooms }}
- name: Compress room state
include_tasks: "{{ role_path }}/tasks/rust-synapse-compress-state/compress_room.yml"
with_items: "{{ matrix_synapse_rust_synapse_compress_state_eligible_rooms }}"
loop_control:
loop_var: room_details
when: "matrix_synapse_rust_synapse_compress_state_find_rooms_command_result.stdout_lines[2] != ' '"
- name: Show notice about lack of rooms to compress
debug:
msg: >-
No rooms were found to contain more than {{ matrix_synapse_rust_synapse_compress_state_min_state_groups_required }} state group rows
(configurable via `matrix_synapse_rust_synapse_compress_state_min_state_groups_required`),
so there's nothing to compress.
when: "matrix_synapse_rust_synapse_compress_state_find_rooms_command_result.stdout_lines[2] == ' '"
Loading…
Cancel
Save