Files
gitlab-foss/doc/development/user_contribution_mapping.md
2025-07-17 00:08:09 +00:00

22 KiB

stage, group, info, title
stage group info title
none Import Any user with at least the Maintainer role can merge updates to this content. For details, see https://docs.gitlab.com/development/development_processes/#development-guidelines-review. User contribution mapping developer documentation

User contribution mapping is a feature that allows imported records to be attributed to a user without needing the user on the source to have been provisioned with a public email in advance. Instead, dummy User records are created to act as a placeholder in imported records until a real user can be assigned to those contributions after the import has completed.

User contribution mapping is implemented within each importer to assign contributions to placeholder users during a migration, but the same process applies across all importers that use this mapping. The process of a group owner reassigning a real user to a placeholder user takes place after the migration has completed and is separate from the migration.

Glossary of terms and relevant models

Term Corresponding ActiveRecord Model Definition
Source user Import::SourceUser Maps placeholder users to real users and tracks reassignment details, import source, and top-level group association.
Placeholder user User with user_type: 'placeholder' A User record that satisfies foreign key constraints during migration intended to be reassigned to a real user after import. Placeholder users cannot log in and have no rights in GitLab.
Assignee user, real user User with user_type: 'human' The human user assigned to a placeholder user.
User contribution Any GitLab ActiveRecord model Any ActiveRecord model imported during a migration that belongs to a User. E.g. merge request assignees, notes, memberships, etc.
Placeholder reference Import::SourceUserPlaceholderReference A separate model to track all placeholder user contributions across the database except memberships.
Placeholder membership Import::Placeholders::Membership A separate model to track imported memberships belonging to placeholder users. Member records are not created for placeholder users during a migration to prevent placeholders from appearing as members.
Import user Import::NamespaceImportUser A placeholder user used when records can't be assigned to a regular placeholder. E.g. when the placeholder user limit has been reached.
Placeholder detail Import::PlaceholderUserDetail A record that tracks which namespaces have placeholder users so that placeholder users can be deleted when their top-level group is deleted.
Placeholder users table N/A Table of placeholder users where group owners can pick real users to assign to placeholder users on the UI. Located on the top-level group's members page under the placeholders tab. Only visible to group owners.

Placeholder user creation during import

Before a placeholder user can be reassigned to a real user, a placeholder user must be created during an import.

User mapping assignment flow for imported contributions

flowchart
    %%% nodes
    Start{{
        Group or project
        migration is started
    }}

    FetchInfo[
        Fetch information 
        about the contribution
    ]
    
    FetchUserInfo[
        Fetch information about 
        the user who is associated 
        with the contribution.
    ] 

    CheckSourceUser{
        Has a user in the destination
        instance already accepted being
        mapped to the source user?
    }

    AssignToUser[
        Assign contribution to the user
    ]

    PlaceholderLimit{
        Namespace reached
        the placeholder limit?
    }

    CreatePlaceholderUser[
        Create a placeholder user and 
        save the details of the source user
    ]

    AssignContributionPlaceholder[
        Assign contribution 
        to placeholder user
    ]

    AssignImportUser[
        Assign contributions to the
        ImportUser and save source user 
        details
    ]

    ImportContribution[
        Save contribution 
        into the database
    ]

    PushPlaceholderReference[
        Push instance of placeholder
        reference to Redis
    ]

    LoadPlaceholderReference(((
        Load placeholder references
        into the database
    )))

    %%% connections 
    Start --> FetchInfo
    FetchInfo --> FetchUserInfo
    FetchUserInfo --> CheckSourceUser
    CheckSourceUser -- Yes --> AssignToUser
    CheckSourceUser -- No --> PlaceholderLimit
    PlaceholderLimit -- No --> CreatePlaceholderUser
    CreatePlaceholderUser --> AssignContributionPlaceholder
    PlaceholderLimit -- Yes --> AssignImportUser
    AssignToUser-->ImportContribution
    AssignContributionPlaceholder-->ImportContribution
    AssignImportUser-->ImportContribution
    ImportContribution-->PushPlaceholderReference
    PushPlaceholderReference-->LoadPlaceholderReference

How to implement user mapping in an importer

  1. Implement a feature flag for user mapping within the importer.

  2. Ensure that the feature flag state is stored at the beginning of the importer so that changing the flag state during an import does not affect the ongoing import.

  3. Before saving a user contribution to the database, use Gitlab::Import::SourceUserMapper#find_or_create_source_user to find the correct User to attribute the contribution to.

  4. Save the contribution with the mapped user to the database.

  5. Once the record is persisted, initialize Import::PlaceholderReferences::PushService and execute it to push an initialized Import::SourceUserPlaceholderReference to Redis. Initializing the service using from_record is often most convenient.

  6. Persist the cached Import::SourceUserPlaceholderReferences asynchronously using the LoadPlaceholderReferencesWorker. This worker uses Import::PlaceholderReferences::LoadService to persist the placeholder references. It's best to periodically call this worker throughout the import, e.g., at the end of a stage, as well as at the end of the import.

    • Important: Placeholder user references are cached before loading to avoid too many concurrent writes on the import_source_user_placeholder_references table. If a database record references a placeholder user's ID but a placeholder reference is not persisted for some reason, the contribution cannot be reassigned and the placeholder user may not be deleted.
  7. Delay finishing the import until all cached placeholder references have been loaded.

    • Parallel third-party importers accomplish this by re-enqueueing the FinishImportWorker if any placeholder references are remaining in Redis for the project. See Gitlab::GithubImport::Stage::FinishImportWorker as an example.
    • Synchronous third-party importers must wait for placeholder references to finish loading in some other manner. The Gitea importer, for example, uses Kernel.sleep to delay the import until placeholder user references have been loaded.
    • Direct Transfer functions similar to parallel importers using BulkImport::ProcessService to re-enqueue BulkImportWorker, delaying a migration from finishing if placeholder users have not finished loading.

Placeholder user reassignment after import

Placeholder user reassignment takes place after an import has completed. The reassignment process is the same for all placeholder users, regardless of the import type that created the placeholder.

Reassignment flow

flowchart TD
%% Nodes
    OwnerAssigns{{
        Group owner requests a 
        source user to be assigned 
        to a user in the destination
    }}
    Notification[
        Notification is sent
        to the user
    ]
    ReceivesNotification[
        User receives the notification and 
        clicks the button to see more details
    ]
    ClickMoreDetails[
        The user is redirected to the more details page
    ]
    CheckRequestStatus{
        Group owner has cancelled 
        the request?
    }
    RequestCancelledPage(((
        Shows request cancelled 
        by the owner message
    )))
    OwnerCancel(
        Group owner chooses to cancel 
        the assignment request
    )
    ReassigmentOptions{
        Show reassignment options
    }
    ReassigmentStarts(((
        Start the reassignment 
        of the contributions
    )))
    ReassigmentRejected(((
        Shows request rejected 
        by the user message
    )))

%% Edge connections between nodes
    OwnerAssigns --> Notification --> ReceivesNotification --> ClickMoreDetails
    OwnerAssigns --> OwnerCancel
    ClickMoreDetails --> CheckRequestStatus
    CheckRequestStatus -- Yes --> RequestCancelledPage
    CheckRequestStatus -- No --> ReassigmentOptions
    ReassigmentOptions -- User accepts --> ReassigmentStarts
    ReassigmentOptions -- User rejects --> ReassigmentRejected
    OwnerCancel-.->CheckRequestStatus
    

Each step of the reassignment flow corresponds to a source user state:

stateDiagram-v2

[*] --> pending_reassignment
pending_reassignment --> reassignment_in_progress: Reassign user and bypass assignee confirmation
awaiting_approval --> reassignment_in_progress: Accept reassignment
reassignment_in_progress --> completed: Contribution reassignment completed successfully
reassignment_in_progress --> failed: Error reassigning contributions
pending_reassignment --> awaiting_approval: Reassign user
awaiting_approval --> pending_reassignment: Cancel reassignment 
awaiting_approval --> rejected: Reject reassignment 
rejected --> pending_reassignment: Cancel reassignment 
rejected --> keep_as_placeholder: Keep as placeholder
pending_reassignment --> keep_as_placeholder: Keeps as placeholder

Instead of calling a state_machines-activerecord method directly on a source user, services have been implemented for each state transition to consistently handle validations and behavior:

  • Import::SourceUsers::ReassignService: Initiates reassignment to a real user. It may begin user contribution reassignment if placeholder confirmation bypass is enabled.
  • Import::SourceUsers::AcceptReassignmentService: Processes user acceptance of reassignment and begins user contribution reassignment.
  • Import::SourceUsers::RejectReassignmentService: Processes user rejection of reassignment.
  • Import::SourceUsers::CancelReassignmentService: Cancels pending reassignment requests before a user has accepted it or after they have rejected it.
  • Import::SourceUsers::KeepAsPlaceholderService: Marks a single user to remain as placeholder

Some additional services exist to handle bulk requests and other related behaviors:

  • Import::SourceUsers::GenerateCsvService: Generates a CSV file to reassign placeholder users in bulk.
  • Import::SourceUsers::BulkReassignFromCsvService: Handles bulk reassignments from CSV files, ultimately calling Import::SourceUsers::ReassignService for each placeholder user in the CSV.
  • Import::SourceUsers::KeepAllAsPlaceholderService: Marks all unassigned placeholder users to stay as placeholder users indefinitely.
  • Import::SourceUsers::ResendNotificationService: Resends reassignment notification email to the reassigned user.

Reassignment flow with placeholder user bypass enabled

When placeholder confirmation bypass is enabled, user contribution reassignment begins immediately on reassignment without the real user's confirmation.

flowchart LR
%% Nodes
    OwnerAssigns{{
        Administrator or enterprise group owner
        requests a source user to be assigned
        to a user in the destination
    }}
    ConfirmAssignmentWithBypass[
      Group owner confirms assignment
      without assignee confirmation
    ]
    ReassigmentStarts((
        Start the reassignment
        of the contributions
    ))
    NotifyAssignee(((
      Reassigned real user
      notified that contributions
      have been reassigned
    )))
%% Edge connections between nodes
    OwnerAssigns  --> ConfirmAssignmentWithBypass --> ReassigmentStarts --> NotifyAssignee

Bypassing an assignee's confirmation can only be done in the following situations:

  • An administrator reassigns a placeholder user on a self-managed instance with the application setting allow_bypass_placeholder_confirmation enabled. The Import::UserMapping::AdminBypassAuthorizer module determines if all criteria are met.
  • An enterprise group owner with GitLab Premium or Ultimate reassigns a placeholder user to a real user within their enterprise on .com with the group setting allow_enterprise_bypass_placeholder_confirmation enabled. The Import::UserMappingEnterpriseBypassAuthorizer module determines if all criteria are met.

User contribution reassignment

When a real user accepts their reassignment, the process of replacing all foreign key references to the placeholder user's ID with the reassigned user's ID begins:

  1. Import::SourceUsers::AcceptReassignmentService asynchronously calls Import::ReassignPlaceholderUserRecordsWorker.

  2. The worker executes Import::ReassignPlaceholderUserRecordsService to replace records with foreign key references to the placeholder user ID with the reassignee's user ID by querying for placeholder references and placeholder memberships belonging to the source user.

  3. The service deletes placeholder references and placeholder memberships after successful reassignment.

  4. The service sets the source user's state to complete.

    • Note: there are valid scenarios where a placeholder reference may not be able to be reassigned. For example, if a user is added as a reviewer to a merge request with a placeholder user reviewer, then the user accepts reassignments to the placeholder who was already a reviewer. This will raise an ActiveRecord::RecordNotUnique error during contribution reassignment, but it's a valid scenario.
    • Note: there is a possibility that the reassignment may fail due to unhandled errors. We need to investigate the issue since the reassignment is supposed to always succeed.
  5. Once the service finishes, the worker calls Import::DeletePlaceholderUserWorker asynchronously to delete the placeholder user. If the placeholder user ID is still referenced in any imported table, it will not be deleted. Check columns_ignored_on_deletion in the AliasResolver for exceptions.

  6. If the placeholder user was reassigned without confirmation from the reassigned user, an email is sent to the reassigned user notifying them that they have been reassigned.

Placeholder reference aliasing

Saving model names and column names to the import_source_user_placeholder_references table is brittle. Actual model and column names can change and there's nothing to update the placeholder records that are currently storing the old names. Instead of treating a placeholder references' model, numeric_key and composite_key as real names, they should be treated as aliases. The Import::PlaceholderReferences::AliasResolver is used to map values stored in these attributes to real model and column names.

When does Import::PlaceholderReferences::AliasResolver need to be updated?

If you were directed here because of a MissingAlias error, Import::PlaceholderReferences::AliasResolver must be updated with the missing alias.

If you're making this change preemptively, Import::PlaceholderReferences::AliasResolver only needs to be updated if

  • The model you're working with is imported by at least one importer that implements user contribution mapping.
  • The column on your model references a User ID. I.e. the column is a foreign key constraint that references users(id) or the column establishes an association between the model and User.

How to update the Import::PlaceholderReferences::AliasResolver on a schema change

There are several scenarios where the Import::PlaceholderReferences::AliasResolver must be updated if an imported model changes.

A new user reference column was added and is imported

Add the new column key to the latest version of the model's alias. The new column does not need to be added to previous versions unless the new column has been populated with a user ID that could belong to a placeholder user.

Example

last_updated_by_id, a reference to a User ID, is added to the snippets table. author_id is still present and unchanged.

"Snippet" => {
  1 => {
    model: Snippet,
-   columns: { "author_id" => "author_id" }
+   columns: { "author_id" => "author_id", "last_updated_by_id" => "last_updated_by_id" }
  }
},

A user reference column was renamed

Add a new version to the model's alias with the updated column name. Update all previous versions' column values with the updated column name as well to ensure placeholder references that have yet to be reassigned will update the correct, most current column name.

Examples

author_id was renamed to user_id on the snippets table. Note that keys in columns stays the same:

 "Snippet" => {
   1 => {
     model: Snippet,
-    columns: { "author_id" => "author_id" }
+    columns: { "author_id" => "user_id" }
+  },
+  2 => {
+    model: Snippet,
+    columns: { "user_id" => "user_id" }
   }
 },

After some time, user_id on the snippets table was changed again to created_by_id:

 "Snippet" => {
   1 => {
     model: Snippet,
-    columns: { "author_id" => "user_id" }
+    columns: { "author_id" => "created_by_id" }
   },
   2 => {
     model: Snippet,
-    columns: { "user_id" => "user_id" }
+    columns: { "user_id" => "created_by_id" }
+  },
+  3 => {
+    model: Snippet,
+    columns: { "created_by_id" => "created_by_id" }
   }
 },

A new model with user contributions is imported

Add an alias to the ALIASES hash for the newly imported model. The model doesn't need to be new in GitLab, it just needs to be newly imported by at least one importer that implements user contribution mapping.

Example

Direct transfer was updated to import Todos. Todo has two user reference columns, user_id and author_id:

},
+"Todo" => {
+  1 => {
+    model: Todo,
+    columns: { "user_id" => "user_id", "author_id" => "author_id"}
+  }
+},
"Vulnerability" => {

A model with user contributions was renamed

Add an alias to the ALIASES hash as if the renamed model were a newly imported model. Also update all previous versions' model value to the new model name. Do not remove the old alias, even if the model under its old name no longer exists. Unused placeholder references referencing the old model name may still exist.

Example

Snippet is renamed to Sample:

+"Sample" => {
+  1 => {
+    model: Sample,
+    columns: { "author_id" => "author_id" }
+  }
+},
 "Snippet" => {
   1 => {
-    model: Snippet,
+    model: Sample,
     columns: { "author_id" => "author_id" }
   }
 },

Edge case example

Some time after Snippet is renamed to Sample, the Snippet model is reintroduced and imported, but with an entirely different use. The new Snippet model belongs to a User using user_id. In this case, do not update previous versions of the "Snippet" alias, as any placeholder reference with alias_version 1 is actually a reference to a Sample:

"Sample" => {
  1 => {
    model: Sample,
    columns: { "author_id" => "author_id" }
  }
},
"Snippet" => {
  1 => {
    model: Sample,
    columns: { "author_id" => "author_id" }
  },
+  2 => {
+    model: Snippet,
+    columns: { "user_id" => "user_id" }
  }
},

Specs

Changes to alias configurations in Import::PlaceholderReferences::AliasResolver generally do not need to be individually tested in its spec file. Its specs are set up to verify that each alias version point to an existing column, and that unindexed columns are listed in columns_ignored_on_deletion to prevent performance issues.