- cherry-pick "Ensure worker config exists in systemd service (#7528)"
from synapse d74cdc1a42e8b487d74c214b1d0ca575429d546a:
"check that the worker config file exists instead of silently failing."
If the SQLite database was from an older version of Synapse, it appears
that Synapse would try to run migrations on it first, before importing.
This was failing, because the file wasn't writable.
Hopefully, this fixes the problem.
Interestingly, no one has reported this failure before #662 (Github
Issue).
It doesn't make sense to keep saying that we support such old Ansible
versions, when we're not even testing on anything close to those.
Time is also passing and such versions are getting more and more
ancient. It's time we bumped our requirements to something that is more
likely to work.
showLabsSettings is the new enableLabs I guess. enableLabs doesn't seem to do anything anymore. It had been deprecated for a while.
This PR also removes @riot-bot:matrix.org as the default welcome_user_id since it doesn't exist anymore.
We recently had a report of the Postgres backup container's log file
growing the size of /var/lib/docker until it ran out of disk space.
Trying to prevent similar problems in the future.
Certain more-minimal Debian installations may not have
lsb-release installed, which makes the playbook fail.
We need lsb-release on Debian, so that ansible_lsb
could tell us if this is Debian or Raspbian.
In #628 I proposed a CORS change that turns out not to be the root of the issue. Caffeine-addled diagnosis leads to sloppy thinking, and this change should be reverted. In fact, if left it will cause problems for new installations.
Even with the v2 updates listed in #503 and partially addressed in #614, this is still needed to enable identity services to function with Element Desktop/Web. Testing on multiple clients with a clean config has confirmed this, at least for my installation.
Fix regression since 2a50b8b6bb (#597).
Dimension is intended to be embedded in various clients,
be it the Element service that we host (at element.DOMAIN),
some other Element (element-desktop running locally), etc.
The when statement is supposed to be on the block, not on the individual task.
It affects all tasks within the block (they're all to be executed when ma1sd is enabled and self-building is requested0.
The tag format used in the `ma1sd` repo have change. Versions no longer
start with 'v', and when building for non-amd64, we also need to strip
off the '-$arch' bit from the Docker image name.
Further, when building the .jar file, `ma1sd` currently names the .jar
based on the project's directory, which we call 'docker-src'. This means
other parts of the `ma1sd` build can't find the .jar file. Remedy this
by ensuring that the dir is called `docker-src/ma1sd`.
`matrix_container_images_self_build` was not really doing anything
anymore. It previously was influencing `matrix_*_self_build` variables,
but it's no longer the case since some time ago.
Individual `matrix_*_self_build` variables are still available.
People that would like to toggle self-building for a specific component
ought to use those.
These variables are also controlled automatically (via
`group_vars/matrix_servers`) depending on `matrix_architecture`.
In other words, self-building is being done automatically for
all components when they don't have a prebuilt image for the specified
architecture. Some components only support `amd64`, while others also
have images for other architectures.
There's no change in the source code. Just a release bump for packaing
reasons. It doesn't matter much for us here, but let's be on the latest
tag anyway.
Postgres setup like
matrix_mautrix_telegram_configuration_extension_yaml: |
appservice:
database: "postgres://XXX:XXX@matrix-postgres:5432/mxtg"
will fail without the right Dockernetwork
`/etc/nginx/conf.d/default.conf` was previously causing
some issues when used with our `--user`.
It's not the case anymore, so we can remove it.
Fixes#369 (Github Issue).
Depending on the distro, common commands like sleep and chown may either
be located in /bin or /usr/bin.
Systemd added path lookup to ExecStart in v239, allowing only the
command name to be put in unit files and not the full path as
historically required. At least Ubuntu 18.04 LTS is however still on
v237 so we should maintain portability for a while longer.
the current version fails the import, because the volume for the media is missing. It still fails if you have the optional shared secret password provider is enabled, so that might need another mount. Commenting out the password provider in the hoimeserver.yaml during the run works as well.
This is mostly here to guard against problems happening
due to server migration and doing `chown -R matrix:matrix /matrix`.
Normally, the file is owned by `1000:1000`, as expected.
If ownership changes, Dimension could still start, but it will fail the
first time it tries to write to the database. Explicitly chowning
before startup guards against this.
Related to #485 and #486 (Github Pull Requests).
Also related to ccc7aaf0ce.
Dimension runs as the `node` user in the container (`1000:1000`).
It doesn't seem like we have a way around it. Thus, its configuration
must also be readable by that user (or group, in this case).
We don't really need to fail in such a spectactular way,
but it's probably good to do. It will only happen for people
who are defining their own user/group id, which is rare.
It seems like a good idea to tell them that this doesn't work
as they expect anymore and to ask them to remove these variables,
which otherwise give them a fake sense of hope.
Related to #486 (Github Pull Request).
If one runs the playbook with `--tags=setup-all`, it would have been
fine.
But running with a specific tag (e.g. `--tags=setup-riot-web`) would
have made that initialization be skipped, and the `matrix-riot-web` role
would fail, due to missing variables.
Ansible will migrate the ownership of the base path and config path, but
manual intervention will be required in order to migrate the ownership
of files in those directories (i.e. dimension.db).
Stop the services:
(local)$ ansible-playbook -i inventory/hosts setup.yml --tags=stop
Fix the permissions on the server:
(server)# chown -Rv "{{ matrix_user_username }}:{{ matrix_user_username }}" "{{ matrix_dimension_base_path }}"
which would typically look like:
(server)# chown -Rv matrix:matrix /matrix/dimension/
Reconfigure Dimension and start the services:
(local)$ ansible-playbook -i inventory/hosts setup.yml --tags=setup-dimension,start
* add permalinkPrefix to riot-web config
* add feature to change default theme of riot-web via its config file
* remove matrix_riot_web_change_default_theme and provide sane default
· 😅 How to keep this in sync with the matrix-synapse documentation?
· regex location matching is expensive
· nginx syntax limit: one location only per block / statement
· thus, lots of duplicate statements in this file
Well, actually 8cd9cde won't work, unless we put the
`|to_nice_yaml` thing on a new line.
We can, but that takes more lines and makes things look uglier.
Using `|to_json` seems good enough.
The whole file is parsed as YAML later on and merged with the
`_extension` variable before being dumped as YAML again in the end.
Hopefully fixes an error like this (which I haven't been able to
reproduce, but..):
> [modules/xmpp/strophe.util.js] <Object.i.Strophe.log>: Strophe: Error: Failed to construct 'RTCPeerConnection': 'matrix.DOMAIN' is not one of the supported URL schemes 'stun', 'turn' or 'turns'.
We define this password in the `sip-communicator.properties`
configuration file, so this is not needed for actually running JVB.
However, it does a (useless) safety check during container startup,
and we need to make that check happy.
We do this for 2 reasons:
- so we can control things which are not controllable using environment
variables (for example `stunServers` in jitsi/web, since we don't wish
to use the hardcoded Google STUN servers if our own Coturn is enabled)
- so playbook variable changes will properly rebuild the configuration.
When using Jitsi environment variables, the configuration is only built
once (the first time) and never rebuilt again. This is not the
consistent with the rest of the playbook and with how Ansible operates.
We're not perfect at it (yet), because we still let the Jitsi containers
generate some files on their own, but we are closer and it should be
good enough for most things.
Related to #415 (Github Pull Request).
This keeps the roles cleaner and more independent of matrix-base,
which may be important for people building their own playbook
out of the individual roles and not using the matrix-base role.
This adds into the Riot config.json the field
'default_server_config.m.homeserver.server_name'
with, by default, the value of the playbook's 'matrix_domain' variable.
Riot displays this string in its login page and will now say 'Sign in to
your Matrix account on example.org' (the server name) instead of 'Sign
in ... on matrix.example.org' (the server domain-name).
This string can be configured by setting the playbook variable
'matrix_riot_web_default_server_name'
to any string, so we can make Riot say for example 'Sign in ... on Our
Server'.
This fixes an incorrect indentation in the database specification for
appservice-irc which caused matrix-appservice-irc to refuse to start
with the remarkably unhelpful error message:
```
ERROR:CLI Failed to run bridge.
```
This also updates doc links to the new matrixdotorg repo because the
tedomum repo contains out-of-date documentation.
Synapse v1.9.0 changed some things which made the REST Auth Password
Provider break.
The ma1uta/matrix-synapse-rest-password-provider implements some
workarounds for now and will likely deliver a proper fix in the future.
Not much has changed between the 2 projects, so this should be a
painless transition.
This change allows us to work with both our existing Docker image
(`tedomum/matrix-appservice-irc:latest`) and with the
official Docker image (`matrixdotorg/matrix-appservice-irc`).
The actual change to the official Docker image requires more testing
and will be done separately.
Can you double check that the way I have this set only exposes it locally? It is important that the manhole is not available to the outside world since it is quite powerful and the password is hard coded.
Switching to the official image (vectorim/riot-web) should ensure:
- there's less breakage, as it's maintained by the same team as riot-web
- there's fewer actors we need to trust
- we can upgrade riot-web faster, as newer versions should be released
on Docker hub at the same time riot-web releases are made
Riot used to be fine with it being blank but now it complains. This creates an ugly looking comma when there is an identity server configured but I guess that's fine.
Prompted by: https://matrix.org/blog/2019/11/09/avoiding-unwelcome-visitors-on-private-matrix-servers
This is a bit controversial, because.. the Synapse default remains open,
while the general advice (as per the blog post) is to make it more private.
I'm not sure exactly what kind of server people set up and whether they
want to make the room directory public. Our general goal is to favor
privacy and security when running personal (family & friends) and corporate
homeservers, both of which likely benefit from having a more secure default.
Don't mention systemd-journald adjustment anymore, because
we've changed log levels to WARNING and Synapse is not chatty by default
anymore.
The "excessive log messages may get dropped on CentOS" issue no longer
applies to most users and we shouldn't bother them with it.
matrix_synapse_storage_path is already defined in matrix-synapse/defaults/main.yml (with a default of "{{ matrix_synapse_base_path }}/storage"), but was not being used for its presumed purpose in matrix-synapse.service.j2. As a result, if matrix_synapse_storage_path was overridden (in a vars.yml), the synapse service failed to start.
Discord client IDs are numeric (e.g. 12345).
Passing them as integers however, causes the Discord bridge's YAML parser
to parse them as integers and its config schema validation will fail.
Fixes#240 (Github Issue)
This only gets triggered if:
- the Synapse role is used standalone and the default values are used
- the whole playbook is used, with `matrix_mxisd_enabled: false`
Continuation of #234 (Github Pull Request).
I had unintentionally updated the documentation for the feature,
saying the page is available at `https://matrix.DOMAIN/nginx_status`.
Looks like it wasn't the case, going against my expectations.
I'm correcting this with this patch.
The status page is being made available on both HTTP and HTTPS.
Serving over HTTP is likely necessary for services like
Longview
(https://www.linode.com/docs/platform/longview/longview-app-for-nginx/)
# Auth server config
auth:
# Publicly accessible base URL for the login endpoints.
# The prefix below is not implicitly added. This URL and all subpaths should be proxied
# or otherwise pointed to the appservice's webserver to the path specified below (prefix).
# This path should usually include a trailing slash.
public: http://example.com/login/
# Internal prefix in the appservice web server for the login endpoints.
prefix: /login
Also discussed previously in #213 (Github Pull Request).
shared-secret-auth and rest-auth logging is still at `INFO`
intentionally, as user login events seem more important to keep.
Those modules typically don't spam as much.
We recently had someone in the support room who set it to `false`
and the playbook ran without any issues.
This currently seems to yield the same result as 'none', but it's
better to avoid such behavior.
It adds support for a new `DISABLE_SENDER_VERIFICATION` environment
variable that can be used to disable verification of sender addresses.
It doesn't matter for us, but we upgrade to keep up with latest.
Looks like these client ids are actually integers,
but unless we pass them as a string, the bridge would complain with
an error like:
{"field":"data.auth.clientID","message":"is the wrong type","value":123456789012345678,"type":"string","schemaPath":["properties","auth","properties","clientID"]}
Explicitly-casting to a string should fix the problem.
The Discord bridge should probably be improved to handle both ints and
strings though.
Regression since 174a6fcd1b, #204 (Github Pull Request),
which only affects new servers.
Old servers which had their passkey.pem file relocated were okay.
ef5e4ad061 intentionally makes us conform to
the logging format suggested by the official Docker image.
Reverting this part, because it's uglier.
This likely should be fixed upstream as well though.
Somewhat related to #213 (Github Pull Request).
We've been moving in the opposite direction for quite a long time.
All services should just leave logging to systemd's journald.
Fixes a regression introduced during the upgrade to
Synapse v1.1.0 (in 2b3865ceea).
Since Synapse v1.1.0 upgraded to Python 3.7
(https://github.com/matrix-org/synapse/pull/5546),
we need to use a different modules directory when mounting
password provider modules.
Well, `config.yaml` has been playbook-managed for a long time.
It's now extended to match the default sample config of the Discord
bridge.
With this patch, we also make `registration.yaml` playbook-managed,
which leads us to consistency with all other bridges.
Along with that, we introduce `./config` and `./data` separation,
like we do for the other bridges.
I've been thinking of doing before, but haven't.
Now that the Whatsapp bridge does it (since 4797469383),
it makes sense to do it for all other bridges as well.
(Except for the IRC bridge - that one manages most of registration.yaml by itself)
appservice-irc doesn't have permission to create files in its project
directory and the intention is to log to the console, anyway. By
commenting out the file names, appservice-irc won't attempt to open the
files.
This means we need to explicitly specify a `media_url` now,
because without it, `url` would be used for building public URLs to
files/images. That doesn't work when `url` is not a public URL.
Until now, if `--tags=setup-synapse` was used, bridge tasks would not
run and bridges would fail to register with the `matrix-synapse` role.
This means that Synapse's configuration would be generated with an empty
list of appservices (`app_service_config_files: []`).
.. and then bridges would fail, because Synapse would not be aware of
there being any bridges.
From now on, bridges always run their init tasks and always register
with Synapse.
For the Telegram bridge, the same applies to registering with
matrix-nginx-proxy. Previously, running `--tags=setup-nginx-proxy` would
get rid of the Telegram endpoint configuration for the same reason.
Not anymore.
With most people on Synapse v0.99+ and Synapse v1.0 now available,
we should no longer try to be backward compatible with Synapse 0.34,
because this just complicates the instructions for no good reason.
Using `|to_json` on a string is expected to correctly wrap it in quotes (e.g. `"4"`).
Wrapping it explicitly in double-quotes results in undesirable double-quoting (`""4""`).
We do use some `:latest` images by default for the following services:
- matrix-dimension
- Goofys (in the matrix-synapse role)
- matrix-bridge-appservice-irc
- matrix-bridge-appservice-discord
- matrix-bridge-mautrix-facebook
- matrix-bridge-mautrix-whatsapp
It's terribly unfortunate that those software projects don't release
anything other than `:latest`, but that's how it is for now.
Updating that software requires that users manually do `docker pull`
on the server. The playbook didn't force-repull images that it already
had.
With this patch, it starts doing so. Any image tagged `:latest` will be
force re-pulled by the playbook every time it's executed.
It should be noted that even though we ask the `docker_image` module to
force-pull, it only reports "changed" when it actually pulls something
new. This is nice, because it lets people know exactly when something
gets updated, as opposed to giving the indication that it's always
updating the images (even though it isn't).
Previously, it only mentioned exposing for psql-usage purposes.
Realistically, it can be used for much more. Especially given that
psql can be easily accessed via our matrix-postgres-cli script,
without exposing the container port.
We log to journald anyway. There's no need for double-logging.
It should not that matrix-synapse logs to journald and to files,
but that's likely to change in the future as well.
Because Synapse's logs are insanely verbose right now (and may get
dropped by journald), it's more reliable to have file-logging too.
As Synapse matures and gets more stable, logging should hopefully
get less, we should be able to only use journald and stop writing to
files for it as well.
Using a separate directory allows easier backups
(only need to back up the Ansible playbook configuration and the
bridge's `./data` directory).
The playbook takes care of migrating an existing database file
from the base directory into the `./data` directory.
In the future, we can also mount the configuration read-only,
to ensure the bridge won't touch it.
For now, mautrix-facebook is keen on rebuilding the `config.yaml`
file on startup though, so this will have to wait.
Related to #193, but for the Facebook bridge.
(other bridges can be changed to do the same later).
This patch makes the bridge configuration entirely managed by the
Ansible playbook. The bridge's `config.yaml` and `registration.yaml`
configuration files are regenerated every time the playbook runs.
This allows us to apply updates to those files and to avoid
people having to manage the configuration files manually on the server.
-------------------------------------------------------------
A deficiency of the current approach to dumping YAML configuration in
`config.yaml` is that we strip all comments from it.
Later on, when the bridge actually starts, it will load and redump
(this time with comments), which will make the `config.yaml` file
change.
Subsequent playbook runs will report "changed" for the
"Ensure mautrix-facebook config.yaml installed" task, which is a little
strange.
We might wish to improve this in the future, if possible.
Still, it's better to have a (usually) somewhat meaningless "changed"
task than to what we had -- never rebuilding the configuration.
Bridges start matrix-synapse.service as a dependency, but
Synapse is sometimes slow to start, while bridges are quick to
hit it and die (if unavailable).
They'll auto-restart later, but .. this still breaks `--tags=start`,
which doesn't wait long enough for such a restart to happen.
This attempts to slow down bridge startup enough to ensure Synapse
is up and no failures happen at all.
Attempt to fix#192 (Github Issue), potential regression since
70487061f4.
Serializing as JSON/YAML explicitly is much better than relying on
magic (well, Python serialization being valid YAML..).
It seems like Python may prefix strings with `u` sometimes (Python 3?),
which causes Python serialization to not be compatible with YAML.
This doesn't replace all usage of `-v`, but it's a start.
People sometimes troubleshoot by deleting files (especially bridge
config files). Restarting Synapse with a missing registration.yaml file
for a given bridge, causes the `-v
/something/registration.yaml:/something/registration.yaml:ro` option
to force-create `/something/registration.yaml` as a directory.
When a path that's provided to the `-v` option is missing, Docker
auto-creates that path as a directory.
This causes more breakage and confusion later on.
We'd rather fail, instead of magically creating directories.
Using `--mount`, instead of `-v` is the solution to this.
From Docker's documentation:
> When you use --mount with type=bind, the host-path must refer to an existing path on the host.
> The path will not be created for you and the service will fail with an error if the path does not exist.
Related to #189 (Github Issue).
People had proxying problems if:
- they used the whole playbook (including the `matrix-nginx-proxy` role)
- and they were disabling the proxy (`matrix_nginx_proxy_enabled: false`)
- and they were proxying with their own nginx server
For them,
`matrix_nginx_proxy_proxy_matrix_additional_server_configuration_blocks`
would not be modified to inject the necessary proxying configuration.
While using certbot means we'll have both files retrieved,
it's actually the fullchain.pem file that we use in nginx configuration.
Using that one for the check makes more sense.
Reasoning is the same as for matrix-org/synapse#5023.
For us, the journal used to contain `docker` for all services, which
is not very helpful when looking at them all together (`journalctl -f`).
The goal is to move each bridge into its own separate role.
This commit starts off the work on this with 2 bridges:
- mautrix-telegram
- mautrix-whatsapp
Each bridge's role (including these 2) is meant to:
- depend only on the matrix-base role
- integrate nicely with the matrix-synapse role (if available)
- integrate nicely with the matrix-nginx-proxy role (if available and if
required). mautrix-telegram bridge benefits from integrating with
it.
- not break if matrix-synapse or matrix-nginx-proxy are not used at all
This has been provoked by #174 (Github Issue).
As discussed in #151 (Github Pull Request), it's
a good idea to not selectively apply casting, but to do it in all
cases involving arithmetic operations.
Previously, we'd show an error like this:
{"changed": false, "item": null, "msg": "Detected an undefined required variable"}
.. which didn't mention the variable name
(`matrix_ssl_lets_encrypt_support_email`).
Looks like we may not have to do this,
since 1.4.2 fixes edge cases for people who used the broken
1.4.0 release.
We jumped straight to 1.4.1, so maybe we're okay.
Still, upgrading anyway, just in case.
Use an int conversion in the computation of the value of
matrix_nginx_proxy_tmp_directory_size_mb, to have the integer value
multiplied by 50 instead of having the string repeated 50 times.
It doesn't hurt to attempt renewal more frequently, as it only does
real work if it's actually necessary.
Reloading, we postpone some more, because certbot adds some random delay
(between 1 and 8 * 60 seconds) when renewing. We want to ensure
we reload at least 8 minutes later, which wasn't the case.
To make it even safer (in case future certbot versions use a longer
delay), we reload a whole hour later. We're in no rush to start using
the new certificates anyway, especially given that we attempt renewal
often.
Somewhat fixes#146 (Github Issue)
The code used to check for a `homeserver.yaml` file and generate
a configuration (+ key) only if such a configuration file didn't exist.
Certain rare cases (setting up with one server name and then
changing to another) lead to `homeserver.yaml` being there,
but a `matrix.DOMAIN.signing.key` file missing (because the domain
changed).
A new signing key file would never get generated, because `homeserver.yaml`'s
existence used to be (incorrectly) satisfactory for us.
From now on, we don't mix things up like that.
We don't care about `homeserver.yaml` anymore, but rather
about the actual signing key.
The rest of the configuration (`homeserver.yaml` and
`matrix.DOMAIN.log.config`) is rebuilt by us in any case, so whether
it exists or not is irrelevant and doesn't need checking.
- matrix_enable_room_list_search - Controls whether searching the public room list is enabled.
- matrix_alias_creation_rules - Controls who's allowed to create aliases on this server.
- matrix_room_list_publication_rules - Controls who can publish and which rooms can be published in the public room list.
`{% matrix_s3_media_store_custom_endpoint_enabled %}` should have
been `{% if matrix_s3_media_store_custom_endpoint_enabled %}` instead.
Related to #132 (Github Pull Request).
In most cases, there's not really a need to touch the system
firewall, as Docker manages iptables by itself
(see https://docs.docker.com/network/iptables/).
All ports exposed by Docker containers are automatically whitelisted
in iptables and wired to the correct container.
This made installing firewalld and whitelisting ports pointless,
as far as this playbook's services are concerned.
People that wish to install firewalld (for other reasons), can do so
manually from now on.
This is inspired by and fixes#97 (Github Issue).
Fixes#129 (Github Issue).
Unfortunately, we rely on `service_facts`, which is only available
in Ansible >= 2.5.
There's little reason to stick to an old version such as Ansible 2.4:
- some time has passed since we've raised version requirements - it's
time to move into the future (a little bit)
- we've recently (in 82b4640072) improved the way one can run
Ansible in a Docker container
From now on, Ansible >= 2.5 is required.
By default, `--tags=self-check` no longer validates certificates
when `matrix_ssl_retrieval_method` is set to `self-signed`.
Besides this default, people can also enable/disable validation using the
individual role variables manually.
Fixes#124 (Github Issue)
Most (all?) of our Matrix services are running in the `matrix` network,
so they were safe -- not accessible from Coturn to begin with.
Isolating Coturn into its own network is a security improvement
for people who were starting other services in the default
Docker network. Those services were potentially reachable over the
private Docker network from Coturn.
Discussed in #120 (Github Pull Request)
This is more explicit than hiding it in the role defaults.
People who reuse the roles in their own playbook (and not only) may
incorrectly define `ansible_host` to be a hostname or some local address.
Making it more explicit is more likely to prevent such mistakes.
Currently the nginx reload cron fails on Debian 9 because the path to
systemctl is /bin/systemctl rather than /usr/bin/systemctl.
CentOS 7 places systemctl in both /bin and /usr/bin, so we can just use
/bin/systemctl as the full path.
This allows overriding the default value for `include_content`. Setting
this to false allows homeserver admins to ensure that message content
isn't sent in the clear through third party servers.
`matrix_nginx_proxy_data_path` has always served as a base path,
so we're renaming it to reflect that.
Along with this, we're also introducing a new "data path" variable
(`matrix_nginx_proxy_data_path`), which is really a data path this time.
It's used for storing additional, non-configuration, files related to
matrix-nginx-proxy.
It's been reported that YAML parsing errors
would occur on certain Ansible/Python combinations for some reason.
It appears that a bare `{{ matrix_dimension_admins }}` would sometimes
yield things like `[u'@user:domain.com', ..]` (note the `u` string prefix).
To prevent such problems, we now explicitly serialize with `|to_json`.
The Server spec says that redirects should be followed for
`/.well-known/matrix/server`. So we follow them.
The Client-Server specs doesn't mention redirects, so we don't
follow redirects there.
Using `docker_container` with a `cap_drop` argument requires
Ansible >=2.7.
We want to support older versions too (2.4), so we either need to
stop invoking it with `cap_drop` (insecure), or just stop using
the module altogether.
Since it was suffering from other bugs too (not deleting containers
on failure), we've decided to remove `docker_container` usage completely.
Some resources shouldn't be cached right now,
as per https://github.com/vector-im/riot-web/pull/8702
(note all of the suggestions from that pull request were applied,
because some of them do not seem relevant - no such files)
Fixes#98 (Github Issue)
`matrix_synapse_no_tls` is now implicit, so we've gotten rid of it.
The `homeserver.yaml.j2` template has been synchronized with the
configuration generated by Synapse v0.99.1 (some new options
are present, etc.)
For consistency with all our other listeners,
we make this one bind on the `::` address too
(both IPv4 and IPv6).
Additional details are in #91 (Github Pull Request).
People who wish to rely on SRV records can prevent
the `/.well-known/matrix/server` file from being generated
(and thus, served.. which causes trouble).
If someone decides to not use `/.well-known/matrix/server` and only
relies on SRV records, then they would need to serve tcp/8448 using
a certificate for the base domain (not for the matrix) domain.
Until now, they could do that by giving the certificate to Synapse
and setting it terminate TLS. That makes swapping certificates
more annoying (Synapse requires a restart to re-read certificates),
so it's better if we can support it via matrix-nginx-proxy.
Mounting certificates (or any other file) into the matrix-nginx-proxy container
can be done with `matrix_nginx_proxy_container_additional_volumes`,
introduced in 96afbbb5a.
Certain use-cases may require that people mount additional files
into the matrix-nginx-proxy container. Similarly to how we do it
for Synapse, we are introducing a new variable that makes this
possible (`matrix_nginx_proxy_container_additional_volumes`).
This makes the htpasswd file for Synapse Metrics (introduced in #86,
Github Pull Request) to also perform mounting using this new mechanism.
Hopefully, for such an "extension", keeping htpasswd file-creation and
volume definition in the same place (the tasks file) is better.
All other major volumes' mounting mechanism remains the same (explicit
mounting).
Continuation of 1f0cc92b33.
As an explanation for the problem:
when saying `localhost` on the host, it sometimes gets resolved to `::1`
and sometimes to `127.0.0.1`. On the unfortunate occassions that
it gets resolved to `::1`, the container won't be able to serve the
request, because Docker containers don't have IPv6 enabled by default.
To avoid this problem, we simply prevent any lookups from happening
and explicitly use `127.0.0.1`.
This reverts commit 0dac5ea508.
Relying on pyOpenSSL is the Ansible way of doing things, but is
impractical and annoying for users.
`openssl` is easily available on most servers, even by default.
We'd better use that.
Seems like we unintentionally removed the mounting of certificates
(the `/matrix-config` mount) as part of splitting the playbook into
roles in 51312b8250.
It appears that those certificates weren't necessary for coturn to
funciton though, so we might just get rid of the configuration as well.
We run containers as a non-root user (no effective capabilities).
Still, if a setuid binary is available in a container image, it could
potentially be used to give the user the default capabilities that the
container was started with. For Docker, the default set currently is:
- "CAP_CHOWN"
- "CAP_DAC_OVERRIDE"
- "CAP_FSETID"
- "CAP_FOWNER"
- "CAP_MKNOD"
- "CAP_NET_RAW"
- "CAP_SETGID"
- "CAP_SETUID"
- "CAP_SETFCAP"
- "CAP_SETPCAP"
- "CAP_NET_BIND_SERVICE"
- "CAP_SYS_CHROOT"
- "CAP_KILL"
- "CAP_AUDIT_WRITE"
We'd rather prevent such a potential escalation by dropping ALL
capabilities.
The problem is nicely explained here: https://github.com/projectatomic/atomic-site/issues/203
This is a known/intentional regression since f92c4d5a27.
The new stance on this is that most people would not have
dnspython, but may have the `dig` tool. There's no good
reason for not increasing our chances of success by trying both
methods (Ansible dig lookup and using the `dig` CLI tool).
Fixes#85 (Github issue).
This makes all containers (except mautrix-telegram and
mautrix-whatsapp), start as a non-root user.
We do this, because we don't trust some of the images.
In any case, we'd rather not trust ALL images and avoid giving
`root` access at all. We can't be sure they would drop privileges
or what they might do before they do it.
Because Postfix doesn't support running as non-root,
it had to be replaced by an Exim mail server.
The matrix-nginx-proxy nginx container image is patched up
(by replacing its main configuration) so that it can work as non-root.
It seems like there's no other good image that we can use and that is up-to-date
(https://hub.docker.com/r/nginxinc/nginx-unprivileged is outdated).
Likewise for riot-web (https://hub.docker.com/r/bubuntux/riot-web/),
we patch it up ourselves when starting (replacing the main nginx
configuration).
Ideally, it would be fixed upstream so we can simplify.
We do match the defaults anyway (by default that is),
but people can customize `matrix_user_uid` and `matrix_user_uid`
and it wouldn't be correct then.
In any case, it's better to be explicit about such an important thing.
If this is a brand new server and Postgres had never started,
detecting it before we even start it is not possible.
This moves the logic, so that it happens later on, when Postgres
would have had the chance to start and possibly initialize
a new empty database.
Fixes#82 (Github issue)
The matrix-nginx-proxy role can now be used independently.
This makes it consistent with all other roles, with
the `matrix-base` role remaining as their only dependency.
Separating matrix-nginx-proxy was relatively straightforward, with
the exception of the Mautrix Telegram reverse-proxying configuration.
Mautrix Telegram, being an extension/bridge, does not feel important enough
to justify its own special handling in matrix-nginx-proxy.
Thus, we've introduced the concept of "additional configuration blocks"
(`matrix_nginx_proxy_proxy_matrix_additional_server_configuration_blocks`),
where any module can register its own custom nginx server blocks.
For such dynamic registration to work, the order of role execution
becomes important. To make it possible for each module participating
in dynamic registration to verify that the order of execution is
correct, we've also introduced a `matrix_nginx_proxy_role_executed`
variable.
It should be noted that this doesn't make the matrix-synapse role
dependent on matrix-nginx-proxy. It's optional runtime detection
and registration, and it only happens in the matrix-synapse role
when `matrix_mautrix_telegram_enabled: true`.
With this change, the following roles are now only dependent
on the minimal `matrix-base` role:
- `matrix-corporal`
- `matrix-coturn`
- `matrix-mailer`
- `matrix-mxisd`
- `matrix-postgres`
- `matrix-riot-web`
- `matrix-synapse`
The `matrix-nginx-proxy` role still does too much and remains
dependent on the others.
Wiring up the various (now-independent) roles happens
via a glue variables file (`group_vars/matrix-servers`).
It's triggered for all hosts in the `matrix-servers` group.
According to Ansible's rules of priority, we have the following
chain of inclusion/overriding now:
- role defaults (mostly empty or good for independent usage)
- playbook glue variables (`group_vars/matrix-servers`)
- inventory host variables (`inventory/host_vars/matrix.<your-domain>`)
All roles default to enabling their main component
(e.g. `matrix_mxisd_enabled: true`, `matrix_riot_web_enabled: true`).
Reasoning: if a role is included in a playbook (especially separately,
in another playbook), it should "work" by default.
Our playbook disables some of those if they are not generally useful
(e.g. `matrix_corporal_enabled: false`).
We've previously changed a bunch of lists in `homeserver.yaml.j2`
to be serialized using `|to_nice_yaml`, as that generates a more
readable list in YAML.
`matrix_synapse_federation_domain_whitelist`, however, couldn't have
been changed to that, as it can potentially be an empty list.
We may be able to differentiate between empty and non-empty now
and serialize it accordingly (favoring `|to_nice_yaml` if non-empty),
but it's not important enough to be justified. Thus, always
serializing with `|to_json`.
Fixes#78 (Github issue)
Riot-web parses integrations_widgets_urls as a list, thus causing it to incorrectly think Scalar widgets are non-Scalar and not passing the scalar token
As suggested in #63 (Github issue), splitting the
playbook's logic into multiple roles will be beneficial for
maintainability.
This patch realizes this split. Still, some components
affect others, so the roles are not really independent of one
another. For example:
- disabling mxisd (`matrix_mxisd_enabled: false`), causes Synapse
and riot-web to reconfigure themselves with other (public)
Identity servers.
- enabling matrix-corporal (`matrix_corporal_enabled: true`) affects
how reverse-proxying (by `matrix-nginx-proxy`) is done, in order to
put matrix-corporal's gateway server in front of Synapse
We may be able to move away from such dependencies in the future,
at the expense of a more complicated manual configuration, but
it's probably not worth sacrificing the convenience we have now.
As part of this work, the way we do "start components" has been
redone now to use a loop, as suggested in #65 (Github issue).
This should make restarting faster and more reliable.
This change is provoked by a few different things:
- #54 (Github Pull Request), which rightfully says that we need a
way to support ALL mxisd configuration options easily
- the upcoming mxisd 1.3.0 release, which drops support for
property-style configuration (dot-notation), forcing us to
redo the way we generate the configuration file
With this, mxisd is much more easily configurable now
and much more easily maintaneable by us in the future
(no need to introduce additional playbook variables and logic).
As suggested in #65 (Github issue), this patch switches
cronjob management from using templates to using Ansible's `cron` module.
It also moves the management of the nginx-reload cronjob to `setup_ssl_lets_encrypt.yml`,
which is a more fitting place for it (given that this cronjob is only required when
Let's Encrypt is used).
Pros:
- using a module is more Ansible-ish than templating our own files in
special directories
- more reliable: will fail early (during playbook execution) if `/usr/bin/crontab`
is not available, which is more of a guarantee that cron is working fine
(idea: we should probably install some cron package using the playbook)
Cons:
- invocation schedule is no longer configurable, unless we define individual
variables for everything or do something smart (splitting on ' ', etc.).
Likely not necessary, however.
- requires us to deprecate and clean-up after the old way of managing cronjobs,
because it's not compatible (using the same file as before means appending
additional jobs to it)
This means we no longer have a dependency on the `dig` program,
but we do have a dependency on `dnspython`.
Improves things as suggested in #65 (Github issue).
After having multiple people report issues with retrieving
SSL certificates, we've finally discovered the culprit to be
Ansible 2.5.1 (default and latest version on Ubuntu 18.04 LTS).
As silly as it is, certain distributions ("LTS" even) are 13 bugfix
versions of Ansible behind.
From now on, we try to auto-detect buggy Ansible versions and tell the
user. We also provide some tips for how to upgrade Ansible or
run it from inside a Docker container.
My testing shows that Ansible 2.4.0 and 2.4.6 are OK.
All other intermediate 2.4.x versions haven't been tested, but we
trust they're OK too.
From the 2.5.x releases, only 2.5.0 and 2.5.1 seem to be affected.
Ansible 2.5.2 corrects the problem with `include_tasks` + `with_items`.
This is a simplification and a way to make it consistent with
how we do Postgres imports (see 6d89319822), using
files coming from the server, not from the local machine.
By encouraging people NOT to use local files,
we potentially avoid problems such as #34 (Github issue),
where people would download `media_store` to their Mac's filesystem
and case-sensitivity issues will actually corrupt it.
By not encouraging local files usage, it's less likely that
people would copy (huge) directories to their local machine like that.
This is a simplification and a way to make it consistent with
how we do Postgres imports (see 6d89319822), using
files coming from the server, not from the local machine.
Until now, if the .sql file contained invalid data, psql would
choke on it, but still return an exit code of 0.
This is very misleading.
We need to pass `-v ON_ERROR_STOP=1` to make it exit
with a proper error exit code when failures happen.
We've had that logic in 2 places so far, leading to duplication
and a maintenance burden.
In the future, we'll also have an import-postgres feature,
which will also need Postgres version detection,
leading to more benefit from that logic being reusable.
Fixes#18 (Github issue).
It would probably be better if we serve our own page,
as the Matrix one says:
"To use this server you'll need a Matrix client", which
is true, but we install Riot by default and it'd be better if we mention
that instead.
If uppercase is used, certain tools (like certbot) would cause trouble.
They would retrieve a certificate for the lowercased domain name,
but we'd try to use it from an uppercase-named directory, which will
fail.
Besides certbot, we may experience other trouble too.
(it hasn't been investigated how far the breakage goes).
To fix it all, we lowercase `host_specific_hostname_identity` by default,
which takes care of the general use-case (people only setting that
and relying on us to build the other domain names - `hostname_matrix`
and `hostname_riot`).
For others, who decide to override these other variables directly
(and who may work around us and introduce uppercase there directly),
we also have the sanity-check tool warn if uppercase is detected
in any of the final domains.
Adds support for managing certificates manually and for
having the playbook generate self-signed certificates for you.
With this, Let's Encrypt usage is no longer required.
Fixes Github issue #50.
This is in line with what the recommendation is for matrix-corporal.
A value higher than 30 seconds is required to satisfy Riot
(and other clients') default long-polling behavior.