Files
gitlab-foss/doc/development/ai_features/vertex_model_enablement_process.md
2025-05-30 00:20:13 +00:00

4.5 KiB

stage, group, info, title
stage group info title
AI-powered AI Framework Any user with at least the Maintainer role can merge updates to this content. For details, see https://docs.gitlab.com/development/development_processes/#development-guidelines-review. Vertex AI Model Enablement Process

Production Environment Setup

1. Request Initiation

  • Create an issue in the GitLab project
    • Use the Model Enablement Request template - see below
    • Specify the model(s) to be enabled (e.g., Codestral)
  • Share the issue link in the #ai-infrastructure channel for visibility

2. Request Processing

  • Request is handled by either:
    • Infrastructure team (Infra)
    • AI Framework team (AIF)

3. Model Enablement

  • For Vertex AI managed models:
    • Team enables the model via the Vertex AI console ("click on enable")
  • For custom configurations:
    • AIF team opens a ticket with Google for customization needs

4. Quota Management

  • Monitoring for existing quota is available from the AI-gateway dashboard. Use the little arrow on the top left to drill down and see quota usage per model.
  • Not all quota are available in our monitoring, all visible quota are available in the GCP console for the gitlab-ai-framework-prod project
  • Quota capacity forecasting is available in tamland
  • Quota increases to shared resources need to be requested from Google
  • Provisioned throughput could be purchased from Google if justifiable.
  • Even when quota is available, requests may be throttled during high demand periods due to Anthropic's resource provisioning model. Unlike direct Google services which over-provision resources, Anthropic provisions based on actual demand. To ensure consistent throughput without throttling, dedicated provisioned throughput can be purchased through Anthropic.

Load Testing Environment Setup

1. Environment Selection

  • Options include:
    • ai-framework-dev
    • ai-framework-stage
    • Dedicated load test environment (e.g., sandbox project)

2. Access Request

  • Create an access request using the template
  • Request roles/writer role for the project

3. Environment Configuration

  • Replicate the exact same model configuration from production
  • Ensure isolation from production to prevent:
    • Load test interrupting production traffic
    • External traffic skewing load test results

4. Model Verification

  • Verify model specs match production environment
  • Validate quotas and capacity before running tests

Best Practices

  • Test new models or model versions before deploying to production
  • Use isolated environments for load testing to prevent impacting users
  • Monitor for GPU capacity issues and rate limits during testing
  • Document configuration changes for future reference

Model Enablement Request Template

### Model Details

- **Model Name**: [e.g., Codestral, Claude 3 Opus, etc.]
- **Provider**: [e.g., Google Vertex AI, Anthropic, etc.]
- **Model Version/Edition**: [e.g., v1, Sonnet, Haiku, etc.]

### Business Justification

- **Purpose**: [Brief description of how this model will be used]
- **Features/Capabilities Required**: [Specific capabilities needed from this model]
- **Expected Impact**: [How this model will improve GitLab features/services]

### Technical Requirements

- **Environment(s)**: [Production, Staging, Dev, etc.]
- **Expected Traffic/Usage**: [Estimated QPS, daily usage, etc.]
- **Required Quotas**: [TPU/GPU hours, tokens per minute, etc. if known]
- **Integration Point**: [Which GitLab service(s) will use this model]

### Timeline

- **Requested By Date**: [When you need this model to be available]
- **Testing Period**: [Planned testing dates before full deployment]

### Additional Information

- **Special Configuration Needs**: [Any custom settings needed]
- **Similar Models Already Enabled**: [For reference/comparison]
- **Links to Relevant Documentation**: [Model documentation, internal specs, etc.]

/label ~"group::ai framework"