mirror of
https://gitlab.com/gitlab-org/gitlab-foss.git
synced 2025-08-03 16:04:30 +00:00
4.5 KiB
4.5 KiB
stage, group, info, title
stage | group | info | title |
---|---|---|---|
AI-powered | AI Framework | Any user with at least the Maintainer role can merge updates to this content. For details, see https://docs.gitlab.com/development/development_processes/#development-guidelines-review. | Vertex AI Model Enablement Process |
Production Environment Setup
1. Request Initiation
- Create an issue in the GitLab project
- Use the Model Enablement Request template - see below
- Specify the model(s) to be enabled (e.g., Codestral)
- Share the issue link in the
#ai-infrastructure
channel for visibility
2. Request Processing
- Request is handled by either:
- Infrastructure team (Infra)
- AI Framework team (AIF)
3. Model Enablement
- For Vertex AI managed models:
- Team enables the model via the Vertex AI console ("click on enable")
- For custom configurations:
- AIF team opens a ticket with Google for customization needs
4. Quota Management
- Monitoring for existing quota is available from the AI-gateway dashboard. Use the little arrow on the top left to drill down and see quota usage per model.
- Not all quota are available in our monitoring, all visible quota are available in the GCP console for the
gitlab-ai-framework-prod
project - Quota capacity forecasting is available in tamland
- Quota increases to shared resources need to be requested from Google
- Provisioned throughput could be purchased from Google if justifiable.
- Even when quota is available, requests may be throttled during high demand periods due to Anthropic's resource provisioning model. Unlike direct Google services which over-provision resources, Anthropic provisions based on actual demand. To ensure consistent throughput without throttling, dedicated provisioned throughput can be purchased through Anthropic.
Load Testing Environment Setup
1. Environment Selection
- Options include:
- ai-framework-dev
- ai-framework-stage
- Dedicated load test environment (e.g., sandbox project)
2. Access Request
- Create an access request using the template
- Request roles/writer role for the project
3. Environment Configuration
- Replicate the exact same model configuration from production
- Ensure isolation from production to prevent:
- Load test interrupting production traffic
- External traffic skewing load test results
4. Model Verification
- Verify model specs match production environment
- Validate quotas and capacity before running tests
Best Practices
- Test new models or model versions before deploying to production
- Use isolated environments for load testing to prevent impacting users
- Monitor for GPU capacity issues and rate limits during testing
- Document configuration changes for future reference
Model Enablement Request Template
### Model Details
- **Model Name**: [e.g., Codestral, Claude 3 Opus, etc.]
- **Provider**: [e.g., Google Vertex AI, Anthropic, etc.]
- **Model Version/Edition**: [e.g., v1, Sonnet, Haiku, etc.]
### Business Justification
- **Purpose**: [Brief description of how this model will be used]
- **Features/Capabilities Required**: [Specific capabilities needed from this model]
- **Expected Impact**: [How this model will improve GitLab features/services]
### Technical Requirements
- **Environment(s)**: [Production, Staging, Dev, etc.]
- **Expected Traffic/Usage**: [Estimated QPS, daily usage, etc.]
- **Required Quotas**: [TPU/GPU hours, tokens per minute, etc. if known]
- **Integration Point**: [Which GitLab service(s) will use this model]
### Timeline
- **Requested By Date**: [When you need this model to be available]
- **Testing Period**: [Planned testing dates before full deployment]
### Additional Information
- **Special Configuration Needs**: [Any custom settings needed]
- **Similar Models Already Enabled**: [For reference/comparison]
- **Links to Relevant Documentation**: [Model documentation, internal specs, etc.]
/label ~"group::ai framework"