Files
gitlab-foss/doc/administration/gitaly/_index.md
2025-07-16 09:13:34 +00:00

8.9 KiB

stage, group, info, title
stage group info title
Data Access Gitaly To determine the technical writer assigned to the Stage/Group associated with this page, see https://handbook.gitlab.com/handbook/product/ux/technical-writing/#assignments Gitaly

{{< details >}}

  • Tier: Free, Premium, Ultimate
  • Offering: GitLab Self-Managed

{{< /details >}}

Gitaly provides high-level remote procedure call (RPC) access to Git repositories. It is used by GitLab to read and write Git data.

Gitaly is present in every GitLab installation and coordinates Git repository storage and retrieval. Gitaly can be:

  • A background service operating on a single instance Linux package installation (all of GitLab on one machine).
  • Separated onto its own instance and configured in a full cluster configuration, depending on scaling and availability requirements.

{{< alert type="note" >}}

Gitaly manages only Git repository access for GitLab. Other types of GitLab data aren't accessed using Gitaly.

{{< /alert >}}

GitLab accesses repositories through the configured repository storages. Each new repository is stored on one of the repository storages based on their configured weights. Each repository storage is either:

  • A Gitaly storage with direct access to repositories using storage paths, where each repository is stored on a single Gitaly node. All requests are routed to this node.
  • A virtual storage provided by Gitaly Cluster (Praefect), where each repository can be stored on multiple Gitaly nodes for fault tolerance. With Gitaly Cluster (Praefect):
    • Read requests are distributed between multiple Gitaly nodes, which can improve performance.
    • Write requests are broadcast to repository replicas.

The following shows GitLab set up to use direct access to Gitaly:

GitLab application interacting with Gitaly storage shards

In this example:

  • Each repository is stored on one of three Gitaly storages: storage-1, storage-2, or storage-3.
  • Each storage is serviced by a Gitaly node.
  • The three Gitaly nodes store data on their file systems.

Disk requirements

Gitaly and Gitaly Cluster (Praefect) require fast local storage to perform effectively because they are heavy I/O-based processes. Therefore, we strongly recommend that all Gitaly nodes use solid-state drives (SSDs). These SSDs should have high read and write throughput as Gitaly operates on many small files concurrently.

As a reference, the following charts show the P99 disk IOPS across the Gitaly production fleet on GitLab.com at a one-minute granularity. The data were queried from a seven-day representative period, starting and ending on a Monday morning. Note the regular spikes in IOPS as traffic becomes more intense during the work week. The raw data shows even larger spikes, with writes peaking at 8000 IOPS. The available disk throughput must handle these spikes to avoid disruptions to Gitaly requests.

  • P99 disk IOPS (reads):

    Chart showing P99 disk IOPS for reads.

  • P99 disk IOPS (writes):

    Chart showing P99 disk IOPS for writes.

We typically see:

  • Between 500 - 1000 reads per second, with peaks of 3500 reads per second.
  • Around 500 writes per second, with peaks of over 3000 writes per second.

The majority of the Gitaly fleet as of the time of writing are t2d-standard-32 instances with pd-ssd disks. The advertised maximum write and read IOPS are 60,000.

GitLab.com also employs stricter concurrency limits on expensive Git operations that aren't enabled by default on GitLab Self-Managed installations. Relaxed concurrency limits, operations against particularly large monorepos, or the use of the pack-objects cache can also significantly increase disk activity.

In practice for your own environment, the disk activity you observe on your Gitaly instances may vary greatly from these published results. If you are running on a cloud environment, choosing larger instances typically increases the available disk IOPS. You may also choose to select a provisioned IOPS disk type with guaranteed throughput. Refer to the documentation of your cloud provider about how to configure IOPS correctly.

For repository data, only local storage is supported for Gitaly and Gitaly Cluster (Praefect) for performance and consistency reasons. Alternatives such as NFS or cloud-based file systems are not supported.

Gitaly architecture

Gitaly implements a client-server architecture:

The following illustrates the Gitaly client-server architecture:

flowchart LR
  subgraph Gitaly clients
    Rails[GitLab Rails]
    Workhorse[GitLab Workhorse]
    Shell[GitLab Shell]
    Zoekt[Zoekt Indexer]
    Elasticsearch[Elasticsearch Indexer]
    KAS["GitLab Agent for Kubernetes (KAS)"]
  end

  subgraph Gitaly
    GitalyServer[Gitaly server]
  end

  FS[Local filesystem]
  ObjectStorage[Object storage]

  Rails -- gRPC --> Gitaly
  Workhorse -- gRPC --> Gitaly
  Shell -- gRPC --> Gitaly
  Zoekt -- gRPC --> Gitaly
  Elasticsearch -- gRPC --> Gitaly
  KAS -- gRPC --> Gitaly

  GitalyServer --> FS
  GitalyServer -- TCP --> Workhorse
  GitalyServer -- TCP --> ObjectStorage

Configuring Gitaly

Gitaly comes pre-configured with a Linux package installation, which is a configuration suitable for up to 20 RPS / 1,000 users. For:

GitLab installations for more than 2000 active users performing daily Git write operation may be best suited by using Gitaly Cluster (Praefect).

Gitaly CLI

{{< history >}}

  • gitaly git subcommand introduced in GitLab 17.4.

{{< /history >}}

The gitaly command is a command-line interface that provides additional subcommands for Gitaly administrators. For example, the Gitaly CLI is used to:

For more information on the other subcommands, run sudo -u git -- /opt/gitlab/embedded/bin/gitaly --help.

Backing up repositories

When backing up or syncing repositories using tools other than GitLab, you must prevent writes while copying repository data.

Bundle URIs

You can use Git bundle URIs with Gitaly. For more information, see the Bundle URIs documentation.

Directly accessing repositories

GitLab doesn't advise directly accessing Gitaly repositories stored on disk with a Git client or any other tool, because Gitaly is being continuously improved and changed. These improvements may invalidate your assumptions, resulting in performance degradation, instability, and even data loss. For example:

{{< alert type="warning" >}}

Accessing Git repositories directly is done at your own risk and is not supported.

{{< /alert >}}