gitlab-foss/vertex_model_enablement_process.md at ae498f0f0cb4073e8c65b0d06910b748084da09d

VladPolskiy/gitlab-foss

Fork 0

mirror of https://gitlab.com/gitlab-org/gitlab-foss.git synced 2025-08-03 16:04:30 +00:00

Files

GitLab Bot 2b2746757e Add latest changes from gitlab-org/gitlab@master

2025-05-30 00:20:13 +00:00

4.5 KiB

Raw Blame History

stage, group, info, title

stage	group	info	title
AI-powered	AI Framework	Any user with at least the Maintainer role can merge updates to this content. For details, see https://docs.gitlab.com/development/development_processes/#development-guidelines-review.	Vertex AI Model Enablement Process

Production Environment Setup

1. Request Initiation

Create an issue in the GitLab project
- Use the Model Enablement Request template - see below
- Specify the model(s) to be enabled (e.g., Codestral)
Share the issue link in the #ai-infrastructure channel for visibility

2. Request Processing

Request is handled by either:
- Infrastructure team (Infra)
- AI Framework team (AIF)

3. Model Enablement

For Vertex AI managed models:
- Team enables the model via the Vertex AI console ("click on enable")
For custom configurations:
- AIF team opens a ticket with Google for customization needs

4. Quota Management

Monitoring for existing quota is available from the AI-gateway dashboard. Use the little arrow on the top left to drill down and see quota usage per model.
Not all quota are available in our monitoring, all visible quota are available in the GCP console for the gitlab-ai-framework-prod project
Quota capacity forecasting is available in tamland
Quota increases to shared resources need to be requested from Google
Provisioned throughput could be purchased from Google if justifiable.
Even when quota is available, requests may be throttled during high demand periods due to Anthropic's resource provisioning model. Unlike direct Google services which over-provision resources, Anthropic provisions based on actual demand. To ensure consistent throughput without throttling, dedicated provisioned throughput can be purchased through Anthropic.

Load Testing Environment Setup

1. Environment Selection

Options include:
- ai-framework-dev
- ai-framework-stage
- Dedicated load test environment (e.g., sandbox project)

2. Access Request

Create an access request using the template
Request roles/writer role for the project

3. Environment Configuration

Replicate the exact same model configuration from production
Ensure isolation from production to prevent:
- Load test interrupting production traffic
- External traffic skewing load test results

4. Model Verification

Verify model specs match production environment
Validate quotas and capacity before running tests

Best Practices

Test new models or model versions before deploying to production
Use isolated environments for load testing to prevent impacting users
Monitor for GPU capacity issues and rate limits during testing
Document configuration changes for future reference

Model Enablement Request Template

### Model Details

- **Model Name**: [e.g., Codestral, Claude 3 Opus, etc.]
- **Provider**: [e.g., Google Vertex AI, Anthropic, etc.]
- **Model Version/Edition**: [e.g., v1, Sonnet, Haiku, etc.]

### Business Justification

- **Purpose**: [Brief description of how this model will be used]
- **Features/Capabilities Required**: [Specific capabilities needed from this model]
- **Expected Impact**: [How this model will improve GitLab features/services]

### Technical Requirements

- **Environment(s)**: [Production, Staging, Dev, etc.]
- **Expected Traffic/Usage**: [Estimated QPS, daily usage, etc.]
- **Required Quotas**: [TPU/GPU hours, tokens per minute, etc. if known]
- **Integration Point**: [Which GitLab service(s) will use this model]

### Timeline

- **Requested By Date**: [When you need this model to be available]
- **Testing Period**: [Planned testing dates before full deployment]

### Additional Information

- **Special Configuration Needs**: [Any custom settings needed]
- **Similar Models Already Enabled**: [For reference/comparison]
- **Links to Relevant Documentation**: [Model documentation, internal specs, etc.]

/label ~"group::ai framework"

4.5 KiB Raw Blame History

Production Environment Setup

1. Request Initiation

2. Request Processing

3. Model Enablement

4. Quota Management

Load Testing Environment Setup

1. Environment Selection

2. Access Request

3. Environment Configuration

4. Model Verification

Best Practices

Model Enablement Request Template

4.5 KiB

Raw Blame History