gitlab-foss/zero_downtime.md at master

VladPolskiy/gitlab-foss

Fork 0

mirror of https://gitlab.com/gitlab-org/gitlab-foss.git synced 2025-08-15 21:39:00 +00:00

Files

GitLab Bot 2569eb3ce0 Add latest changes from gitlab-org/gitlab@master

2025-08-11 12:23:06 +00:00

20 KiB

Raw Permalink Blame History

stage, group, info, title

stage	group	info	title
GitLab Delivery	Self Managed	To determine the technical writer assigned to the Stage/Group associated with this page, see https://handbook.gitlab.com/handbook/product/ux/technical-writing/#assignments	Upgrade a multi-node instance with zero downtime

Tier: Free, Premium, Ultimate
Offering: GitLab Self-Managed

In this section we'll go through the core process of upgrading a multi-node GitLab environment by sequentially going through each as per the upgrade order and load balancers / HA mechanisms handle each node going down accordingly.

For the purposes of this guide we'll upgrade a 200 RPS or 10,000 Reference Architecture built with the Linux package.

Consul, PostgreSQL, PgBouncer, and Redis

The Consul, PostgreSQL, PgBouncer, and Redis components all follow the same underlying process to upgrading without downtime.

Run through the following steps sequentially on each component's node to perform the upgrade:

Create an empty file at /etc/gitlab/skip-auto-reconfigure. This prevents upgrades from running gitlab-ctl reconfigure, which by default automatically stops GitLab, runs all database migrations, and restarts GitLab:
```
sudo touch /etc/gitlab/skip-auto-reconfigure
```
Upgrade the GitLab package.
Reconfigure and restart to get the latest code in place:
```
sudo gitlab-ctl reconfigure
```
{{< tabs >}}

{{< tab title="For PostgreSQL nodes only" >}}

Restart the Consul client first, then restart all other services to ensure PostgreSQL failover occurs gracefully:
```
sudo gitlab-ctl restart consul
sudo gitlab-ctl restart-except consul
```
{{< /tab >}}

{{< tab title="For all other component nodes" >}}
```
sudo gitlab-ctl restart
```
{{< /tab >}}

{{< /tabs >}}

Gitaly

Gitaly follows the same core process when it comes to upgrading but with a key difference that the Gitaly process itself is not restarted as it has a built-in process to gracefully reload at the earliest opportunity. Other components will still need to be restarted.

The upgrade process attempts to do a graceful handover to a new Gitaly process. Existing long-running Git requests that were started before the upgrade may eventually be dropped as this handover occurs. In the future this functionality may be changed, refer to this Epic for more information.

This process applies to both Gitaly Sharded and Cluster setups. Run through the following steps sequentially on each Gitaly node to perform the upgrade:

Create an empty file at /etc/gitlab/skip-auto-reconfigure. This prevents upgrades from running gitlab-ctl reconfigure, which by default automatically stops GitLab, runs all database migrations, and restarts GitLab:
```
sudo touch /etc/gitlab/skip-auto-reconfigure
```
Upgrade the GitLab package.
Run the reconfigure command to get the latest code in place and to instruct Gitaly to gracefully reload at the next opportunity:
```
sudo gitlab-ctl reconfigure
```

Finally, while Gitaly will gracefully reload any other components that have been deployed, we will still need a restart:

# Get a list of what other components have been deployed beside Gitaly
sudo gitlab-ctl status

# Restart each component except Gitaly. Example given for Consul, Node Exporter and Logrotate
sudo gitlab-ctl restart consul node-exporter logrotate

Gitaly Cluster (Praefect)

For Gitaly Cluster (Praefect) setups, you must deploy and upgrade Praefect in a similar way by using a graceful reload.

The upgrade process attempts to do a graceful handover to a new Praefect process. Existing long-running Git requests that were started before the upgrade may eventually be dropped as this handover occurs. In the future this functionality may be changed, refer to this Epic for more information.

This section focuses exclusively on the Praefect component, not its required PostgreSQL database. The GitLab Linux package does not offer HA and subsequently Zero Downtime support for the Praefect database. A third party database solution is required to avoid downtime.

Praefect must also perform database migrations to upgrade any existing data. To avoid clashes, migrations should run on only one Praefect node. To do this, designate a specific node as a deploy node that runs the migrations. This is referred to as the Praefect deploy node in the following steps:

On the Praefect deploy node:
1. Create an empty file at /etc/gitlab/skip-auto-reconfigure. This prevents upgrades from running gitlab-ctl reconfigure, which by default automatically stops GitLab, runs all database migrations, and restarts GitLab:
```
sudo touch /etc/gitlab/skip-auto-reconfigure
```
2. Upgrade the GitLab package.
3. Ensure that praefect['auto_migrate'] = true is set in /etc/gitlab/gitlab.rb so that database migrations run.
4. Run the reconfigure command to get the latest code in place, apply the Praefect database migrations and restart gracefully:
```
sudo gitlab-ctl reconfigure
```
On all remaining Praefect nodes:
1. Create an empty file at /etc/gitlab/skip-auto-reconfigure. This prevents upgrades from running gitlab-ctl reconfigure, which by default automatically stops GitLab, runs all database migrations, and restarts GitLab:
```
sudo touch /etc/gitlab/skip-auto-reconfigure
```
2. Upgrade the GitLab package.
3. Ensure that praefect['auto_migrate'] = false is set in /etc/gitlab/gitlab.rb to prevent reconfigure from automatically running database migrations.
4. Run the reconfigure command to get the latest code in place as well as restart gracefully:
```
sudo gitlab-ctl reconfigure
```

Finally, while Praefect will gracefully reload, any other components that have been deployed will still need a restart. On all Praefect nodes:

# Get a list of what other components have been deployed beside Praefect
sudo gitlab-ctl status

# Restart each component except Praefect. Example given for Consul, Node Exporter and Logrotate
sudo gitlab-ctl restart consul node-exporter logrotate

Rails

Rails as a webserver consists primarily of Puma, Workhorse, and NGINX.

Each of these components have different behaviours when it comes to doing a live upgrade. While Puma can allow for a graceful reload, Workhorse doesn't. The best approach is to drain the node gracefully through other means, such as by using your load balancer. You can also do this by using NGINX on the node through its graceful shutdown functionality. This section explains the NGINX approach.

In addition to the previous, Rails is where the main database migrations need to be executed. Like Praefect, the best approach is by using the deploy node. If PgBouncer is currently being used, it also needs to be bypassed as Rails uses an advisory lock when attempting to run a migration to prevent concurrent migrations from running on the same database. These locks are not shared across transactions, resulting in ActiveRecord::ConcurrentMigrationError and other issues when running database migrations using PgBouncer in transaction pooling mode.

On the Rails deploy node:
1. Drain the node of traffic gracefully. You can do this in various ways, but one approach is to use NGINX by sending it a QUIT signal and then stopping the service. As an example, you can do this by using the following shell script:
```
# Send QUIT to NGINX master process to drain and exit
NGINX_PID=$(cat /var/opt/gitlab/nginx/nginx.pid)
kill -QUIT $NGINX_PID

# Wait for drain to complete
while kill -0 $NGINX_PID 2>/dev/null; do sleep 1; done

# Stop NGINX service to prevent automatic restarts
gitlab-ctl stop nginx
```
2. Create an empty file at /etc/gitlab/skip-auto-reconfigure. This prevents upgrades from running gitlab-ctl reconfigure, which by default automatically stops GitLab, runs all database migrations, and restarts GitLab:
```
sudo touch /etc/gitlab/skip-auto-reconfigure
```
3. Upgrade the GitLab package.
4. Configure regular migrations to run by setting gitlab_rails['auto_migrate'] = true in the /etc/gitlab/gitlab.rb configuration file.
  - If the deploy node is currently going through PgBouncer to reach the database then you must bypass it and connect directly to the database leader before running migrations.
  - To find the database leader you can run the following command on any database node - sudo gitlab-ctl patroni members.
5. Run the regular migrations and get the latest code in place:
```
sudo SKIP_POST_DEPLOYMENT_MIGRATIONS=true gitlab-ctl reconfigure
```
6. Leave this node as-is for now as you'll come back to run post-deployment migrations later.
On every other Rails node sequentially:
1. Drain the node of traffic gracefully. You can do this in various ways, but one approach is to use NGINX by sending it a QUIT signal and then stopping the service. As an example, you can do this by using the following shell script:
```
# Send QUIT to NGINX master process to drain and exit
NGINX_PID=$(cat /var/opt/gitlab/nginx/nginx.pid)
kill -QUIT $NGINX_PID

# Wait for drain to complete
while kill -0 $NGINX_PID 2>/dev/null; do sleep 1; done

# Stop NGINX service to prevent automatic restarts
gitlab-ctl stop nginx
```
2. Create an empty file at /etc/gitlab/skip-auto-reconfigure. This prevents upgrades from running gitlab-ctl reconfigure, which by default automatically stops GitLab, runs all database migrations, and restarts GitLab:
```
sudo touch /etc/gitlab/skip-auto-reconfigure
```
3. Upgrade the GitLab package.
4. Ensure that gitlab_rails['auto_migrate'] = false is set in /etc/gitlab/gitlab.rb to prevent reconfigure from automatically running database migrations.
5. Run the reconfigure command to get the latest code in place as well as restart:
```
sudo gitlab-ctl reconfigure
sudo gitlab-ctl restart
```
On the Rails deploy node run the post-deployment migrations:
1. Ensure the deploy node is still pointing at the database leader directly. If the node is currently going through PgBouncer to reach the database then you must bypass it and connect directly to the database leader before running migrations.
  - To find the database leader you can run the following command on any database node - sudo gitlab-ctl patroni members.
2. Run the post-deployment migrations:
```
sudo gitlab-rake db:migrate
```
3. Return the config back to normal by setting gitlab_rails['auto_migrate'] = false in the /etc/gitlab/gitlab.rb configuration file.
  - If PgBouncer is being used make sure to set the database config to once again point towards it
4. Run through reconfigure once again to reapply the normal config as well as restart:
```
sudo gitlab-ctl reconfigure
sudo gitlab-ctl restart
```

Sidekiq

Sidekiq follows the same underlying process as others to upgrading without downtime.

Run through the following steps sequentially on each component node to perform the upgrade:

Create an empty file at /etc/gitlab/skip-auto-reconfigure. This prevents upgrades from running gitlab-ctl reconfigure, which by default automatically stops GitLab, runs all database migrations, and restarts GitLab:
```
sudo touch /etc/gitlab/skip-auto-reconfigure
```
Upgrade the GitLab package.
Run the reconfigure command to get the latest code in place as well as restart:
```
sudo gitlab-ctl reconfigure
sudo gitlab-ctl restart
```

Multi-node / HA deployment with Geo

Tier: Premium, Ultimate
Offering: GitLab Self-Managed

This section describes the steps required to upgrade live GitLab environment deployment with Geo.

Overall, the approach is largely the same as the normal process with some additional steps required for each secondary site. The required order is upgrading the primary first, then the secondaries. You must also run any post-deployment migrations on the primary after all secondaries have been updated.

The same requirements and consideration apply for upgrading a live GitLab environment with Geo.

Primary site

The upgrade process for the Primary site is the same as the normal process with one exception being not to run the post-deployment migrations until after all the secondaries have been updated.

Run through the same steps for the Primary site as described but stopping at the Rails node step of running the post-deployment migrations.

Secondary sites

The upgrade process for any Secondary sites follow the same steps as the normal process except for the Rails nodes The upgrade process is the same for both primary and secondary sites. However, you must perform the following additional steps for Rails nodes on secondary sites.

Rails

On the Rails deploy node:
1. Drain the node of traffic gracefully. You can do this in various ways, but one approach is to use NGINX by sending it a QUIT signal and then stopping the service. As an example, you can do this by using the following shell script:
```
# Send QUIT to NGINX master process to drain and exit
NGINX_PID=$(cat /var/opt/gitlab/nginx/nginx.pid)
kill -QUIT $NGINX_PID

# Wait for drain to complete
while kill -0 $NGINX_PID 2>/dev/null; do sleep 1; done

# Stop NGINX service to prevent automatic restarts
gitlab-ctl stop nginx
```
2. Stop the Geo Log Cursor process to ensure it fails over to another node:
```
gitlab-ctl stop geo-logcursor
```
3. Create an empty file at /etc/gitlab/skip-auto-reconfigure. This prevents upgrades from running gitlab-ctl reconfigure, which by default automatically stops GitLab, runs all database migrations, and restarts GitLab:
```
sudo touch /etc/gitlab/skip-auto-reconfigure
```
4. Upgrade the GitLab package.
5. Copy the /etc/gitlab/gitlab-secrets.json file from the primary site Rails node to the secondary site Rails node if they're different. The file must be the same on all of a site's nodes.
6. Ensure no migrations are configured to be run automatically by setting gitlab_rails['auto_migrate'] = false and geo_secondary['auto_migrate'] = false in the /etc/gitlab/gitlab.rb configuration file.
7. Run the reconfigure command to get the latest code in place as well as restart:
```
sudo gitlab-ctl reconfigure
sudo gitlab-ctl restart
```
8. Run the regular Geo Tracking migrations and get the latest code in place:
```
sudo SKIP_POST_DEPLOYMENT_MIGRATIONS=true gitlab-rake db:migrate:geo
```
On every other Rails node sequentially:
1. Drain the node of traffic gracefully. You can do this in various ways, but one approach is to use NGINX by sending it a QUIT signal and then stopping the service. As an example, you can do this by using the following shell script:
```
# Send QUIT to NGINX master process to drain and exit
NGINX_PID=$(cat /var/opt/gitlab/nginx/nginx.pid)
kill -QUIT $NGINX_PID

# Wait for drain to complete
while kill -0 $NGINX_PID 2>/dev/null; do sleep 1; done

# Stop NGINX service to prevent automatic restarts
gitlab-ctl stop nginx
```
2. Stop the Geo Log Cursor process to ensure it fails over to another node:
```
gitlab-ctl stop geo-logcursor
```
3. Create an empty file at /etc/gitlab/skip-auto-reconfigure. This prevents upgrades from running gitlab-ctl reconfigure, which by default automatically stops GitLab, runs all database migrations, and restarts GitLab:
```
sudo touch /etc/gitlab/skip-auto-reconfigure
```
4. Upgrade the GitLab package.
5. Ensure no migrations are configured to be run automatically by setting gitlab_rails['auto_migrate'] = false and geo_secondary['auto_migrate'] = false in the /etc/gitlab/gitlab.rb configuration file.
6. Run the reconfigure command to get the latest code in place as well as restart:
```
sudo gitlab-ctl reconfigure
sudo gitlab-ctl restart
```

Sidekiq

Following the main process all that's left to be done now is to upgrade Sidekiq.

Upgrade Sidekiq in the same manner as described in the main section.

Post-deployment migrations

Finally, head back to the primary site and finish the upgrade by running the post-deployment migrations:

On the Primary site's Rails deploy node run the post-deployment migrations:
1. Ensure the deploy node is still pointing at the database leader directly. If the node is currently going through PgBouncer to reach the database then you must bypass it and connect directly to the database leader before running migrations.
  - To find the database leader you can run the following command on any database node - sudo gitlab-ctl patroni members.
2. Run the post-deployment migrations:
```
sudo gitlab-rake db:migrate
```
3. Verify Geo configuration and dependencies
```
sudo gitlab-rake gitlab:geo:check
```
4. Return the config back to normal by setting gitlab_rails['auto_migrate'] = false in the /etc/gitlab/gitlab.rb configuration file.
  - If PgBouncer is being used make sure to set the database config to once again point towards it
5. Run through reconfigure once again to reapply the normal config as well as restart:
```
sudo gitlab-ctl reconfigure
sudo gitlab-ctl restart
```
On the Secondary site's Rails deploy node run the post-deployment Geo Tracking migrations:
1. Run the post-deployment Geo Tracking migrations:
```
sudo gitlab-rake db:migrate:geo
```
2. Verify Geo status:
```
sudo gitlab-rake geo:status
```

20 KiB Raw Permalink Blame History

Consul, PostgreSQL, PgBouncer, and Redis

Gitaly

Gitaly Cluster (Praefect)

Rails

Sidekiq

Multi-node / HA deployment with Geo

Primary site

Secondary sites

Rails

Sidekiq

Post-deployment migrations

20 KiB

Raw Permalink Blame History