Agent Platform GenAI Inference Skill

This skill provides instructions for authenticating and connecting to Google Cloud Agent Platform to use Generative AI models. It covers both First-Party (Gemini) and Third-Party (OpenMaaS) models.

Safety & Confirmation Tiers (CRITICAL)

Before executing any commands or scripts on behalf of the user, you must adhere to the following safety tiers based on the action requested. (The skill is read-only; other safety tiers are omitted):

Tier R: Read-only / Inference (client.models.generate_content, client.chat.completions.create, client.completions.create, client.embeddings.create)
- Requires interactive confirmation with 'Yes'/ 'No' options before executing model inference on behalf of the user, to prevent unexpected cost or quota consumption. The confirmation prompt must clearly explain the proposed inference execution and its key parameters (e.g., target model ID, SDK choice, input prompt). Natural-language paraphrases without specifying the parameters are NOT sufficient.
- Same-turn restriction: Do not execute the inference scripts or commands in the same turn as presenting the confirmation prompt. Stop and wait for the user's reply; only execute after explicit 'Yes' / approval.
- Gold Standard Example:
  I will perform model inference with the following parameters. Please confirm this information before I proceed:
  - Model ID: deepseek-ai/deepseek-v3.2-maas
  - SDK: OpenAI SDK (via Vertex AI Endpoint)
  - Input Prompt: "Explain the concept of quantum computing..." Do you confirm? [Yes/No]

Phase 0: Environment Setup

CRITICAL: Before running any of the Python sample scripts in the scripts/ directory (e.g., scripts/openmaas_openai_sdk.py), you MUST ensure the environment is correctly initialized by following these steps:

Google Cloud Authentication: Authenticate with your Google Cloud credentials and configure active Application Default Credentials (ADC) for Agent Platform access:
```
gcloud auth login
gcloud auth application-default login
```

Enable API (if not already enabled):

gcloud services enable aiplatform.googleapis.com

Virtual Environment: Create and activate a dedicated local virtual environment:
```
python3 -m venv .venv
source .venv/bin/activate
```
Install Dependencies: Install the required SDKs:
```
pip install -r scripts/requirements.txt
```
Verify Setup (Optional): Run all sample scripts at once to verify the environment is working end-to-end:
```
./scripts/verify_all.sh
```
Execution: Advise the user that every time they execute a Python snippet from this skill, they must ensure this virtual environment is activated first.

[!IMPORTANT] CRITICAL: Model IDs & Availability

Gemini Models: See Gemini Models for valid Model IDs and Regions.

OpenMaaS Models: See [Use Open Models on Agent Platform] (https://docs.cloud.google.com/gemini-enterprise-agent-platform/models/maas/use-open-models) for Llama, DeepSeek, Qwen, etc.

Incomplete Lists: The Model IDs listed in this skill are examples only and may be incomplete or outdated.

Action: Always verify the Model ID and Region using the links above before generating code.

Workflow Decision Tree

Model Family Identification: Has the user specified whether they want to call a Gemini (First-Party) model or an OpenMaaS (Third-Party, e.g. Llama, DeepSeek, Qwen) model?
- No -> Ask the user which model family they want to use. If they provide a specific model name, infer the family from the name.
- Yes -> Proceed to Step 2.
SDK Choice: Which SDK does the user want to use?
- Gemini + GenAI SDK (preferred for Gemini) -> Proceed to [1. Gemini Models].
- Gemini + legacy Vertex AI SDK -> Proceed to [1. Gemini Models].
- OpenMaaS + OpenAI SDK (preferred for OpenMaaS) -> Proceed to [2. OpenMaaS Models].
- OpenMaaS + GenAI SDK -> Proceed to [2. OpenMaaS Models].
- Unsure -> Default to the preferred SDK for the chosen family.
Troubleshooting: Is the user reporting an error (429 Resource Exhausted, 400 User Validation, 404 Not Found, etc.)?
- Yes -> Proceed to [3. Troubleshooting & Common Error Codes].
- No -> Proceed with the SDK choice from Step 2.

1. Gemini Models

For Gemini models (e.g., gemini-2.5-pro, gemini-3-flash-preview), the GenAI SDK (google-genai) is the PREFERRED method. The legacy vertexai SDK is still supported but GenAI SDK is recommended for new projects.

[!IMPORTANT] Preview Models (including Gemini 3.1) are often ONLY available in the global region. Stable models are available in us-central1 and other regions.

Choosing the Right SDK

Gemini Models: GenAI SDK (google-genai) is PREFERRED. Use OpenAI SDK for compatibility, or Legacy SDK (vertexai) if needed.
OpenMaaS Models: OpenAI SDK is HIGHLY RECOMMENDED. Use GenAI SDK or Legacy SDK if you have specific infrastructure requirements.

Installation

pip install google-genai

Python Example (GenAI SDK - Preferred)

See scripts/gemini_genai_sdk.py for the complete code.

Alternative: OpenAI SDK (Chat Completions)

Use the standard OpenAI SDK with the Agent Platform endpoint. This is great for cross-compatibility.

See scripts/gemini_openai_sdk.py for the complete code.

Legacy: Agent Platform SDK

The legacy vertexai SDK is still widely used but google-genai is preferred for new Gemini projects.

See scripts/gemini_vertexai_sdk.py for the complete code.

Documentation: Google GenAI SDK

Documentation: Agent Platform Gemini Models

2. OpenMaaS Models (Llama, DeepSeek, Qwen, etc.)

For OpenMaaS (Model-as-a-Service) models, the HIGHLY RECOMMENDED approach is to use the standard OpenAI SDK with a specific Vertex AI endpoint.

[!WARNING] While GenerativeModel can support some OpenMaaS models, it is discouraged. Use the OpenAI SDK for best compatibility (especially for Chat Completions).

Installation

pip install openai google-auth

Authentication for OpenAI SDK

You MUST use a Google Cloud OAuth access token as the API key for the OpenAI SDK.

import google.auth
from google.auth.transport.requests import Request

def get_gcp_access_token():
    creds, _ = google.auth.default()
    creds.refresh(Request())
    return creds.token

[!NOTE] Google Cloud access tokens typically expire after 1 hour. The get_gcp_access_token() function above retrieves a fresh token at the time it is called.

For long-running applications, you implement a refresh mechanism. See Refresh the access token for details.

Configuration (Base URL)

Global Endpoint (Recommended for most models requiring global availability): https://aiplatform.googleapis.com/v1/projects/{PROJECT_ID}/locations/global/endpoints/openapi
Regional Endpoint: https://{REGION}-aiplatform.googleapis.com/v1/projects/{PROJECT_ID}/locations/{REGION}/endpoints/openapi

Python Example (OpenMaaS - Chat Completions)

See scripts/openmaas_openai_sdk.py for the complete code.

[!TIP] Alternative: Environment Variables You can set environment variables in your shell instead of updating the code.
export OPENAI_BASE_URL="https://aiplatform.googleapis.com/v1/projects/YOUR_PROJECT_ID/locations/global/endpoints/openapi"
export OPENAI_API_KEY="$(gcloud auth print-access-token)"
Then initialize the client without arguments: client = OpenAI()

Python Example (OpenMaaS - Completions API)

The following models support the legacy Completions API: zai-org/glm-5-maas, moonshotai/kimi-k2-thinking-maas, minimaxai/minimax-m2-maas, deepseek-ai/deepseek-v3.1-maas, and deepseek-ai/deepseek-v3.2-maas.

response = client.completions.create(
    model="deepseek-ai/deepseek-v3.2-maas",
    prompt="Once upon a time",
    max_tokens=100
)
print(response.choices[0].text)

Python Example (OpenMaaS - Embeddings)

# Verify specific Embedding Model ID on Model Garden (e.g., intfloat/multilingual-e5-small)
response = client.embeddings.create(
    model="intfloat/multilingual-e5-large-maas",
    input="The quick brown fox jumps over the lazy dog",
)
print(response.data[0].embedding)

Alternative: GenAI SDK

The google-genai SDK can also access OpenMaaS models via the vertexai backend.

See scripts/openmaas_genai_sdk.py for the complete code.

[!IMPORTANT] Model ID Format: For GenAI SDK with OpenMaaS, you MUST use the full path: publishers/PUBLISHER/models/MODEL (e.g., publishers/zai-org/models/glm-5-maas).

Legacy: Agent Platform SDK (OpenMaaS)

For OpenMaaS, you can also use GenerativeModel (if supported).

See scripts/openmaas_vertexai_sdk.py for the complete code.

[!IMPORTANT] Model ID Format: For Agent Platform SDK with OpenMaaS, you MUST use the full path: publishers/PUBLISHER/models/MODEL.

Model Reference & Availability

Documentation: Use Open Models on Agent Platform

[!TIP] Self-Deployment for Control: If you need dedicated hardware (GPUs/TPUs), guaranteed capacity, or specific regional placement not offered by MaaS, you can Self-Deploy these models to Agent Platform Endpoints. Search for the model in Model Garden and click "Deploy" to select your machine type.

[!IMPORTANT] Finding Inference Examples: The list above is a starting point. For the definitive inference snippets (especially for Chat Completions payload structure):

Consult the Use Open Models on Agent Platform list.

Click the link for your specific model (e.g., "DeepSeek-V3") to visit its Model Garden page.

Look for the "Sample Code" or "Use this model" button on the Model Garden page to get the exact curl or Python code for that specific model version.

[!NOTE] This list is INCOMPLETE. See [Use Open Models on Agent Platform] (https://docs.cloud.google.com/gemini-enterprise-agent-platform/models/maas/use-open-models) for the full list of supported models.

Model Family	Model ID Examples	Location	Notes
Llama 4	`meta/llama-4-maverick-17b-128e-instruct-maas`	`us-east5`
Llama 4	`meta/llama-4-scout-17b-16e-instruct-maas`	`us-east5`
Llama 3.3	`meta/llama-3.3-70b-instruct-maas`	`us-central1`
DeepSeek	`deepseek-ai/deepseek-v3.2-maas`	`global`	Global ONLY
DeepSeek	`deepseek-ai/deepseek-v3.1-maas`	`us-west2`	US-West2 ONLY
DeepSeek	`deepseek-ai/deepseek-r1-0528-maas`	`us-central1`
Qwen 3	`qwen/qwen3-coder-480b-a35b-instruct-maas`	`global`
Qwen 3	`qwen/qwen3-next-80b-a3b-instruct-maas`	`global`
Kimi	`moonshotai/kimi-k2-thinking-maas`	`global`
MiniMax	`minimaxai/minimax-m2-maas`	`global`
GLM	`zai-org/glm-4.7-maas`, `zai-org/glm-5-maas`	`global`

3. Troubleshooting & Common Error Codes

429: Resource Exhausted

Cause: OpenMaaS and Gemini models use Dynamic Shared Quota (DSQ). Resources are pooled and allocated dynamically based on availability. A 429 error indicates the shared pool is temporarily exhausted, not necessarily that your specific project quota is hit (though it can be).
Solution: Implement strict exponential backoff and retry strategies.
High Throughput: For production workloads requiring high throughput or guaranteed capacity, consider Provisioned Throughput (PT).
Important: Quota increases through normal cloud processes (Cloud Console) are NOT applicable for DSQ constraints.
Documentation: Quotas and limits (DSQ)

400: User Validation Error

Cause: Invalid request format, unsupported parameter, or incorrect Model ID.
Action: Double-check your request payload and parameters. Verify the Model ID and Region are correct.

404: Not Found / Model Not Available

Cause: The model is not enabled, or not available in the specified project or region.
Action:
1. Check Location Availability:
  - OpenMaaS: Verify the model is available in your region. See Model Availability by Location.
  - Gemini:
    - Source of Truth: Always check Gemini Model Locations for the authoritative list.
    - Preview Models: All Preview models (e.g., Gemini 3.1, experimental versions) are often ONLY available in the us-central1 or global regions.
    - Stable Models: (e.g., Gemini 2.5 Pro) Available in us-central1, europe-west4, and many other regions.
    - Important: If you get a 404/400 error, try switching your client location to us-central1 or global.
2. Enable Llama Models: For Llama 3.3 and Llama 4, you MUST enable the model in Model Garden before use. Go to the [Model Garden] (https://console.cloud.google.com/agent-platform/model-garden), search for the model card (e.g., "Llama 3.3 API Service"), and click Enable. Only then can you make inference requests.

Files9

9 files · 15.9 KB

Select a file to preview

Grade adjusted by static analysis guardrails

AI scored this skill as grade A, but static analysis findings capped it to B:

• Recursive deletion pattern (rm -rf) (max: B)

Overall Score

88/100

Grade

B

Good

Safety

85

Quality

92

Clarity

87

Completeness

83

Summary

This skill teaches agents how to authenticate with Google Cloud Agent Platform and perform inference with Gemini and OpenMaaS models using multiple SDKs (GenAI, OpenAI, legacy Vertex AI). It includes environment setup instructions, code examples for Chat Completions and Embeddings APIs, SDK selection guidance, and troubleshooting for common errors (429 DSQ exhaustion, 400 validation, 404 availability). The skill enforces critical safety tiers requiring explicit user confirmation before executing any inference to prevent unexpected cost/quota consumption.

Static Analysis Findings

1 finding

Patterns detected by deterministic static analysis before AI scoring. Hover over any finding code for detailed information and remediation guidance.

Destructive Operation

SEC-001Recursive DeletionMax: B

Recursive deletion pattern (rm -rf)

scripts/verify_all.shrm -rf

100% confidenceCWE-379: Unrestricted File Deletion

Detected Capabilities

file readvirtual environment creationpython dependency installationshell script executiongoogle cloud authenticationapi credential retrievalhttp request to cloud endpoints

Trigger Keywords

Phrases that MCP clients use to match this skill to user intent.

call gemini modelsopenmaas inferencedeepseek api accessllama model deploymentcloud agent platformvertex ai authenticationmodel quota exhaustion

Risk Signals

INFO

Recursive deletion (rm -rf) in verify_all.sh cleanup function

scripts/verify_all.sh:15

INFO

Google Cloud ADC (Application Default Credentials) authentication required

SKILL.md Phase 0 Step 1

INFO

Google Cloud API access token retrieval via get_gcp_access_token()

scripts/openmaas_openai_sdk.py, scripts/gemini_openai_sdk.py

INFO

HTTP requests to aiplatform.googleapis.com and regional endpoints

SKILL.md section 2, scripts/openmaas_openai_sdk.py:20-23

INFO

Virtual environment temporary directory creation and cleanup

scripts/verify_all.sh:5-16

Referenced Domains

External domains referenced in skill content, detected by static analysis.

aiplatform.googleapis.comconsole.cloud.google.comdocs.cloud.google.comgithub.comwww.apache.org{region}-aiplatform.googleapis.com

Use Cases

Call Gemini models via GenAI SDK
Call OpenMaaS models (Llama, DeepSeek, Qwen) via OpenAI SDK
Authenticate with Google Cloud credentials for Agent Platform
Configure regional and global endpoints for inference
Debug 429 Resource Exhausted quota errors
Resolve 404 model unavailable errors
Compare SDK choices (GenAI vs OpenAI vs legacy Vertex AI)

Quality Notes

Excellent: Safety Tier R explicitly requires interactive confirmation before inference execution, with gold-standard example showing clear parameter specification
Excellent: Comprehensive Phase 0 setup walkthrough covers authentication, API enablement, virtual environment, dependencies, and verification
Excellent: Decision tree workflow guides agent through model family selection, SDK choice, and error handling paths
Excellent: Multiple SDK examples (GenAI, OpenAI, legacy Vertex AI) for both Gemini and OpenMaaS with clear installation and usage patterns
Excellent: Detailed troubleshooting section for 429, 400, 404 errors with root causes and actionable solutions
Excellent: Critical callouts for model availability by region (preview vs stable) and location-specific deployment
Good: Model reference table with current examples (DeepSeek, Llama 4, Qwen 3, etc.) but acknowledged as incomplete with links to authoritative sources
Good: Token refresh mechanism documented for long-running applications with link to official docs
Minor: verify_all.sh uses temporary directory for venv which is good practice, cleanup is guarded with trap
Minor: scripts reference requirements.txt correctly and all Python examples follow consistent patterns

Model: claude-haiku-4-5-20251001Analyzed: Jun 28, 2026

Reviews

Add this skill to your library to leave a review.

No reviews yet

Be the first to share your experience.

Version History

v1.1

Content updated

2026-06-28

Latest

v1.0

No changelog

2026-06-05

agent-platform-inference

Agent Platform GenAI Inference Skill

Safety & Confirmation Tiers (CRITICAL)

Phase 0: Environment Setup

Workflow Decision Tree

1. Gemini Models

Choosing the Right SDK

Installation

Python Example (GenAI SDK - Preferred)

Alternative: OpenAI SDK (Chat Completions)

Legacy: Agent Platform SDK

2. OpenMaaS Models (Llama, DeepSeek, Qwen, etc.)

Installation

Authentication for OpenAI SDK

Configuration (Base URL)

Python Example (OpenMaaS - Chat Completions)

Python Example (OpenMaaS - Completions API)

Python Example (OpenMaaS - Embeddings)

Alternative: GenAI SDK

Legacy: Agent Platform SDK (OpenMaaS)

Model Reference & Availability

3. Troubleshooting & Common Error Codes

429: Resource Exhausted

400: User Validation Error

404: Not Found / Model Not Available

Summary

Static Analysis Findings

Detected Capabilities

Trigger Keywords

Risk Signals

Referenced Domains

Use Cases

Quality Notes

Reviews

Version History

Command Palette