22 KiB
stage, group, info, title
stage | group | info | title |
---|---|---|---|
none | Import | Any user with at least the Maintainer role can merge updates to this content. For details, see https://docs.gitlab.com/development/development_processes/#development-guidelines-review. | User contribution mapping developer documentation |
User contribution mapping is a feature that allows imported records to be attributed to a user without needing the user on the source to have been provisioned with a public email in advance. Instead, dummy User records are created to act as a placeholder in imported records until a real user can be assigned to those contributions after the import has completed.
User contribution mapping is implemented within each importer to assign contributions to placeholder users during a migration, but the same process applies across all importers that use this mapping. The process of a group owner reassigning a real user to a placeholder user takes place after the migration has completed and is separate from the migration.
Glossary of terms and relevant models
Term | Corresponding ActiveRecord Model |
Definition |
---|---|---|
Source user | Import::SourceUser |
Maps placeholder users to real users and tracks reassignment details, import source, and top-level group association. |
Placeholder user | User with user_type: 'placeholder' |
A User record that satisfies foreign key constraints during migration intended to be reassigned to a real user after import. Placeholder users cannot log in and have no rights in GitLab. |
Assignee user, real user | User with user_type: 'human' |
The human user assigned to a placeholder user. |
User contribution | Any GitLab ActiveRecord model | Any ActiveRecord model imported during a migration that belongs to a User . E.g. merge request assignees, notes, memberships, etc. |
Placeholder reference | Import::SourceUserPlaceholderReference |
A separate model to track all placeholder user contributions across the database except memberships. |
Placeholder membership | Import::Placeholders::Membership |
A separate model to track imported memberships belonging to placeholder users. Member records are not created for placeholder users during a migration to prevent placeholders from appearing as members. |
Import user | Import::NamespaceImportUser |
A placeholder user used when records can't be assigned to a regular placeholder. E.g. when the placeholder user limit has been reached. |
Placeholder detail | Import::PlaceholderUserDetail |
A record that tracks which namespaces have placeholder users so that placeholder users can be deleted when their top-level group is deleted. |
Placeholder users table | N/A | Table of placeholder users where group owners can pick real users to assign to placeholder users on the UI. Located on the top-level group's members page under the placeholders tab. Only visible to group owners. |
Placeholder user creation during import
Before a placeholder user can be reassigned to a real user, a placeholder user must be created during an import.
User mapping assignment flow for imported contributions
flowchart
%%% nodes
Start{{
Group or project
migration is started
}}
FetchInfo[
Fetch information
about the contribution
]
FetchUserInfo[
Fetch information about
the user who is associated
with the contribution.
]
CheckSourceUser{
Has a user in the destination
instance already accepted being
mapped to the source user?
}
AssignToUser[
Assign contribution to the user
]
PlaceholderLimit{
Namespace reached
the placeholder limit?
}
CreatePlaceholderUser[
Create a placeholder user and
save the details of the source user
]
AssignContributionPlaceholder[
Assign contribution
to placeholder user
]
AssignImportUser[
Assign contributions to the
ImportUser and save source user
details
]
ImportContribution[
Save contribution
into the database
]
PushPlaceholderReference[
Push instance of placeholder
reference to Redis
]
LoadPlaceholderReference(((
Load placeholder references
into the database
)))
%%% connections
Start --> FetchInfo
FetchInfo --> FetchUserInfo
FetchUserInfo --> CheckSourceUser
CheckSourceUser -- Yes --> AssignToUser
CheckSourceUser -- No --> PlaceholderLimit
PlaceholderLimit -- No --> CreatePlaceholderUser
CreatePlaceholderUser --> AssignContributionPlaceholder
PlaceholderLimit -- Yes --> AssignImportUser
AssignToUser-->ImportContribution
AssignContributionPlaceholder-->ImportContribution
AssignImportUser-->ImportContribution
ImportContribution-->PushPlaceholderReference
PushPlaceholderReference-->LoadPlaceholderReference
How to implement user mapping in an importer
-
Implement a feature flag for user mapping within the importer.
-
Ensure that the feature flag state is stored at the beginning of the importer so that changing the flag state during an import does not affect the ongoing import.
- Third-party importers accomplish this by setting
project.import_data[:data][:user_contribution_mapping_enabled]
at the start of the import. SeeGitlab::GithubImport::Settings
orGitlab::BitbucketServerImport::ProjectCreator
as examples. - direct transfer uses
EphemeralData
inBulkImports::CreateService
to cache the feature flag states.
- Third-party importers accomplish this by setting
-
Before saving a user contribution to the database, use
Gitlab::Import::SourceUserMapper#find_or_create_source_user
to find the correctUser
to attribute the contribution to. -
Save the contribution with the mapped user to the database.
-
Once the record is persisted, initialize
Import::PlaceholderReferences::PushService
and execute it to push an initializedImport::SourceUserPlaceholderReference
to Redis. Initializing the service usingfrom_record
is often most convenient. -
Persist the cached
Import::SourceUserPlaceholderReference
s asynchronously using theLoadPlaceholderReferencesWorker
. This worker usesImport::PlaceholderReferences::LoadService
to persist the placeholder references. It's best to periodically call this worker throughout the import, e.g., at the end of a stage, as well as at the end of the import.- Important: Placeholder user references are cached before loading to avoid too many concurrent writes on the
import_source_user_placeholder_references
table. If a database record references a placeholder user's ID but a placeholder reference is not persisted for some reason, the contribution cannot be reassigned and the placeholder user may not be deleted.
- Important: Placeholder user references are cached before loading to avoid too many concurrent writes on the
-
Delay finishing the import until all cached placeholder references have been loaded.
- Parallel third-party importers accomplish this by re-enqueueing the
FinishImportWorker
if any placeholder references are remaining in Redis for the project. SeeGitlab::GithubImport::Stage::FinishImportWorker
as an example. - Synchronous third-party importers must wait for placeholder references to finish loading in some other manner. The Gitea importer, for example, uses
Kernel.sleep
to delay the import until placeholder user references have been loaded. - Direct Transfer functions similar to parallel importers using
BulkImport::ProcessService
to re-enqueueBulkImportWorker
, delaying a migration from finishing if placeholder users have not finished loading.
- Parallel third-party importers accomplish this by re-enqueueing the
Placeholder user reassignment after import
Placeholder user reassignment takes place after an import has completed. The reassignment process is the same for all placeholder users, regardless of the import type that created the placeholder.
Reassignment flow
flowchart TD
%% Nodes
OwnerAssigns{{
Group owner requests a
source user to be assigned
to a user in the destination
}}
Notification[
Notification is sent
to the user
]
ReceivesNotification[
User receives the notification and
clicks the button to see more details
]
ClickMoreDetails[
The user is redirected to the more details page
]
CheckRequestStatus{
Group owner has cancelled
the request?
}
RequestCancelledPage(((
Shows request cancelled
by the owner message
)))
OwnerCancel(
Group owner chooses to cancel
the assignment request
)
ReassigmentOptions{
Show reassignment options
}
ReassigmentStarts(((
Start the reassignment
of the contributions
)))
ReassigmentRejected(((
Shows request rejected
by the user message
)))
%% Edge connections between nodes
OwnerAssigns --> Notification --> ReceivesNotification --> ClickMoreDetails
OwnerAssigns --> OwnerCancel
ClickMoreDetails --> CheckRequestStatus
CheckRequestStatus -- Yes --> RequestCancelledPage
CheckRequestStatus -- No --> ReassigmentOptions
ReassigmentOptions -- User accepts --> ReassigmentStarts
ReassigmentOptions -- User rejects --> ReassigmentRejected
OwnerCancel-.->CheckRequestStatus
Each step of the reassignment flow corresponds to a source user state:
stateDiagram-v2
[*] --> pending_reassignment
pending_reassignment --> reassignment_in_progress: Reassign user and bypass assignee confirmation
awaiting_approval --> reassignment_in_progress: Accept reassignment
reassignment_in_progress --> completed: Contribution reassignment completed successfully
reassignment_in_progress --> failed: Error reassigning contributions
pending_reassignment --> awaiting_approval: Reassign user
awaiting_approval --> pending_reassignment: Cancel reassignment
awaiting_approval --> rejected: Reject reassignment
rejected --> pending_reassignment: Cancel reassignment
rejected --> keep_as_placeholder: Keep as placeholder
pending_reassignment --> keep_as_placeholder: Keeps as placeholder
Instead of calling a state_machines-activerecord
method directly on a source user, services have been implemented for each state transition to consistently handle validations and behavior:
Import::SourceUsers::ReassignService
: Initiates reassignment to a real user. It may begin user contribution reassignment if placeholder confirmation bypass is enabled.Import::SourceUsers::AcceptReassignmentService
: Processes user acceptance of reassignment and begins user contribution reassignment.Import::SourceUsers::RejectReassignmentService
: Processes user rejection of reassignment.Import::SourceUsers::CancelReassignmentService
: Cancels pending reassignment requests before a user has accepted it or after they have rejected it.Import::SourceUsers::KeepAsPlaceholderService
: Marks a single user to remain as placeholder
Some additional services exist to handle bulk requests and other related behaviors:
Import::SourceUsers::GenerateCsvService
: Generates a CSV file to reassign placeholder users in bulk.Import::SourceUsers::BulkReassignFromCsvService
: Handles bulk reassignments from CSV files, ultimately callingImport::SourceUsers::ReassignService
for each placeholder user in the CSV.Import::SourceUsers::KeepAllAsPlaceholderService
: Marks all unassigned placeholder users to stay as placeholder users indefinitely.Import::SourceUsers::ResendNotificationService
: Resends reassignment notification email to the reassigned user.
Reassignment flow with placeholder user bypass enabled
When placeholder confirmation bypass is enabled, user contribution reassignment begins immediately on reassignment without the real user's confirmation.
flowchart LR
%% Nodes
OwnerAssigns{{
Administrator or enterprise group owner
requests a source user to be assigned
to a user in the destination
}}
ConfirmAssignmentWithBypass[
Group owner confirms assignment
without assignee confirmation
]
ReassigmentStarts((
Start the reassignment
of the contributions
))
NotifyAssignee(((
Reassigned real user
notified that contributions
have been reassigned
)))
%% Edge connections between nodes
OwnerAssigns --> ConfirmAssignmentWithBypass --> ReassigmentStarts --> NotifyAssignee
Bypassing an assignee's confirmation can only be done in the following situations:
- An administrator reassigns a placeholder user on a self-managed instance with the application setting
allow_bypass_placeholder_confirmation
enabled. TheImport::UserMapping::AdminBypassAuthorizer
module determines if all criteria are met. - An enterprise group owner with GitLab Premium or Ultimate reassigns a placeholder user to a real user within their enterprise on
.com
with the group settingallow_enterprise_bypass_placeholder_confirmation
enabled. TheImport::UserMappingEnterpriseBypassAuthorizer
module determines if all criteria are met.
User contribution reassignment
When a real user accepts their reassignment, the process of replacing all foreign key references to the placeholder user's ID with the reassigned user's ID begins:
-
Import::SourceUsers::AcceptReassignmentService
asynchronously callsImport::ReassignPlaceholderUserRecordsWorker
. -
The worker executes
Import::ReassignPlaceholderUserRecordsService
to replace records with foreign key references to the placeholder user ID with the reassignee's user ID by querying for placeholder references and placeholder memberships belonging to the source user. -
The service deletes placeholder references and placeholder memberships after successful reassignment.
-
The service sets the source user's state to
complete
.- Note: there are valid scenarios where a placeholder reference may not be able to be reassigned. For example, if a user is added as a reviewer to a merge request with a placeholder user reviewer, then the user accepts reassignments to the placeholder who was already a reviewer. This will raise an
ActiveRecord::RecordNotUnique
error during contribution reassignment, but it's a valid scenario. - Note: there is a possibility that the reassignment may fail due to unhandled errors. We need to investigate the issue since the reassignment is supposed to always succeed.
- Note: there are valid scenarios where a placeholder reference may not be able to be reassigned. For example, if a user is added as a reviewer to a merge request with a placeholder user reviewer, then the user accepts reassignments to the placeholder who was already a reviewer. This will raise an
-
Once the service finishes, the worker calls
Import::DeletePlaceholderUserWorker
asynchronously to delete the placeholder user. If the placeholder user ID is still referenced in any imported table, it will not be deleted. Checkcolumns_ignored_on_deletion
in the AliasResolver for exceptions. -
If the placeholder user was reassigned without confirmation from the reassigned user, an email is sent to the reassigned user notifying them that they have been reassigned.
Placeholder reference aliasing
Saving model names and column names to the import_source_user_placeholder_references
table is brittle. Actual model and column names can change and there's nothing to update the placeholder records that are currently storing the old names. Instead of treating a placeholder references' model
, numeric_key
and composite_key
as real names, they should be treated as aliases. The Import::PlaceholderReferences::AliasResolver
is used to map values stored in these attributes to real model and column names.
When does Import::PlaceholderReferences::AliasResolver
need to be updated?
If you were directed here because of a MissingAlias
error, Import::PlaceholderReferences::AliasResolver
must be updated with the missing alias.
If you're making this change preemptively, Import::PlaceholderReferences::AliasResolver
only needs to be updated if
- The model you're working with is imported by at least one importer that implements user contribution mapping.
- The column on your model references a
User
ID. I.e. the column is a foreign key constraint that referencesusers(id)
or the column establishes an association between the model andUser
.
How to update the Import::PlaceholderReferences::AliasResolver
on a schema change
There are several scenarios where the Import::PlaceholderReferences::AliasResolver
must be updated if an imported model changes.
A new user reference column was added and is imported
Add the new column key to the latest version of the model's alias. The new column does not need to be added to previous versions unless the new column has been populated with a user ID that could belong to a placeholder user.
Example
last_updated_by_id
, a reference to a User
ID, is added to the snippets
table. author_id
is still present and unchanged.
"Snippet" => {
1 => {
model: Snippet,
- columns: { "author_id" => "author_id" }
+ columns: { "author_id" => "author_id", "last_updated_by_id" => "last_updated_by_id" }
}
},
A user reference column was renamed
Add a new version to the model's alias with the updated column name. Update all previous versions' column values with the updated column name as well to ensure placeholder references that have yet to be reassigned will update the correct, most current column name.
Examples
author_id
was renamed to user_id
on the snippets
table. Note that keys in columns
stays the same:
"Snippet" => {
1 => {
model: Snippet,
- columns: { "author_id" => "author_id" }
+ columns: { "author_id" => "user_id" }
+ },
+ 2 => {
+ model: Snippet,
+ columns: { "user_id" => "user_id" }
}
},
After some time, user_id
on the snippets
table was changed again to created_by_id
:
"Snippet" => {
1 => {
model: Snippet,
- columns: { "author_id" => "user_id" }
+ columns: { "author_id" => "created_by_id" }
},
2 => {
model: Snippet,
- columns: { "user_id" => "user_id" }
+ columns: { "user_id" => "created_by_id" }
+ },
+ 3 => {
+ model: Snippet,
+ columns: { "created_by_id" => "created_by_id" }
}
},
A new model with user contributions is imported
Add an alias to the ALIASES
hash for the newly imported model. The model doesn't need to be new in GitLab, it just needs to be newly imported by at least one importer that implements user contribution mapping.
Example
Direct transfer was updated to import Todo
s. Todo
has two user reference columns, user_id
and author_id
:
},
+"Todo" => {
+ 1 => {
+ model: Todo,
+ columns: { "user_id" => "user_id", "author_id" => "author_id"}
+ }
+},
"Vulnerability" => {
A model with user contributions was renamed
Add an alias to the ALIASES
hash as if the renamed model were a newly imported model. Also update all previous versions' model value to the new model name. Do not remove the old alias, even if the model under its old name no longer exists. Unused placeholder references referencing the old model name may still exist.
Example
Snippet
is renamed to Sample
:
+"Sample" => {
+ 1 => {
+ model: Sample,
+ columns: { "author_id" => "author_id" }
+ }
+},
"Snippet" => {
1 => {
- model: Snippet,
+ model: Sample,
columns: { "author_id" => "author_id" }
}
},
Edge case example
Some time after Snippet
is renamed to Sample
, the Snippet
model is reintroduced and imported, but with an entirely different use. The new Snippet
model belongs to a User
using user_id
. In this case, do not update previous versions of the "Snippet"
alias, as any placeholder reference with alias_version
1 is actually a reference to a Sample
:
"Sample" => {
1 => {
model: Sample,
columns: { "author_id" => "author_id" }
}
},
"Snippet" => {
1 => {
model: Sample,
columns: { "author_id" => "author_id" }
},
+ 2 => {
+ model: Snippet,
+ columns: { "user_id" => "user_id" }
}
},
Specs
Changes to alias configurations in Import::PlaceholderReferences::AliasResolver
generally do not need to be individually tested in its spec file. Its specs are set up to verify that each alias version point to an existing column, and that unindexed columns are listed in columns_ignored_on_deletion
to prevent performance issues.