--- stage: AI-powered group: AI Framework info: Any user with at least the Maintainer role can merge updates to this content. For details, see https://docs.gitlab.com/development/development_processes/#development-guidelines-review. title: Vertex AI Model Enablement Process --- ## Production Environment Setup ### 1. Request Initiation - Create an issue in the [GitLab project](https://gitlab.com/gitlab-org/gitlab/-/issues) - Use the Model Enablement Request template - see below - Specify the model(s) to be enabled (e.g., Codestral) - Share the issue link in the `#ai-infrastructure` channel for visibility ### 2. Request Processing - Request is handled by either: - Infrastructure team (Infra) - AI Framework team (AIF) ### 3. Model Enablement - For Vertex AI managed models: - Team enables the model via the Vertex AI console ("click on enable") - For custom configurations: - AIF team opens a ticket with Google for customization needs ### 4. Quota Management - Monitoring for existing quota is available from the [AI-gateway dashboard](https://dashboards.gitlab.net/d/ai-gateway-main/ai-gateway3a-overview?from=now-6h%2Fm&orgId=1&timezone=utc&to=now%2Fm&var-PROMETHEUS_DS=mimir-runway&var-environment=gprd&viewPanel=panel-1217942947). Use the little arrow on the top left to drill down and see quota usage per model. - Not all quota are available in our monitoring, all visible quota are available in the [GCP console for the `gitlab-ai-framework-prod` project](https://console.cloud.google.com/iam-admin/quotas?referrer=search&inv=1&invt=Abs5YQ&project=gitlab-ai-framework-prod) - Quota capacity forecasting is available in [tamland](https://gitlab-com.gitlab.io/gl-infra/capacity-planning-trackers/gitlab-com/service_groups/ai-gateway/) - Quota increases to shared resources need to be requested from Google - Provisioned throughput could be purchased from Google if justifiable. - Even when quota is available, requests may be throttled during high demand periods due to Anthropic's resource provisioning model. Unlike direct Google services which over-provision resources, Anthropic provisions based on actual demand. To ensure consistent throughput without throttling, dedicated provisioned throughput can be purchased through Anthropic. ## Load Testing Environment Setup ### 1. Environment Selection - Options include: - ai-framework-dev - ai-framework-stage - Dedicated load test environment (e.g., sandbox project) ### 2. Access Request - Create an access request using the [template](https://gitlab.com/gitlab-com/team-member-epics/access-requests/-/issues/new?description_template=Individual_Bulk_Access_Request) - Request roles/writer role for the project ### 3. Environment Configuration - Replicate the exact same model configuration from production - Ensure isolation from production to prevent: - Load test interrupting production traffic - External traffic skewing load test results ### 4. Model Verification - Verify model specs match production environment - Validate quotas and capacity before running tests ## Best Practices - Test new models or model versions before deploying to production - Use isolated environments for load testing to prevent impacting users - Monitor for GPU capacity issues and rate limits during testing - Document configuration changes for future reference ## Model Enablement Request Template ```markdown ### Model Details - **Model Name**: [e.g., Codestral, Claude 3 Opus, etc.] - **Provider**: [e.g., Google Vertex AI, Anthropic, etc.] - **Model Version/Edition**: [e.g., v1, Sonnet, Haiku, etc.] ### Business Justification - **Purpose**: [Brief description of how this model will be used] - **Features/Capabilities Required**: [Specific capabilities needed from this model] - **Expected Impact**: [How this model will improve GitLab features/services] ### Technical Requirements - **Environment(s)**: [Production, Staging, Dev, etc.] - **Expected Traffic/Usage**: [Estimated QPS, daily usage, etc.] - **Required Quotas**: [TPU/GPU hours, tokens per minute, etc. if known] - **Integration Point**: [Which GitLab service(s) will use this model] ### Timeline - **Requested By Date**: [When you need this model to be available] - **Testing Period**: [Planned testing dates before full deployment] ### Additional Information - **Special Configuration Needs**: [Any custom settings needed] - **Similar Models Already Enabled**: [For reference/comparison] - **Links to Relevant Documentation**: [Model documentation, internal specs, etc.] /label ~"group::ai framework" ```