Agent Platform RAG Engine Management
This skill provides instructions on how to interact with Agent Platform RAG
Engine using the Agent Platform Python SDK. You
MUST use the vertexai Python SDK to perform RAG Engine operations, rather than
raw REST calls or MCP tools, because this code is intended to be run by external
clients.
Phase 0: Environment Setup
CRITICAL: Before running any of the Python snippets below, you MUST ensure the environment is correctly initialized by following these steps:
- Google Cloud Authentication: Authenticate with your Google Cloud
credentials and configure active Application Default Credentials (ADC) for
Agent Platform access:
gcloud auth login gcloud auth application-default login - Virtual Environment: Create and activate a dedicated virtual
environment:
python3 -m venv ~/rag_agent_venv source ~/rag_agent_venv/bin/activate - Install Dependencies: Install the required Agent Platform SDKs:
pip install google-cloud-aiplatform google-genai - Execution: Advise the user that every time they execute a Python snippet, they must ensure this virtual environment is activated first.
Workflow Decision Tree
-
Information Gathering: Has the user provided the Project ID, Region, and Corpus ID?
- No -> Proceed to [1. Listing Corpora and Files] to discover the necessary Resource Names and IDs. Only ask the user if discovery fails.
- Yes -> Proceed.
-
Task Type: What does the user want to do?
- List Corpora and Files -> Proceed to [1. Listing Corpora and Files].
- Inspect a Corpus -> Proceed to [2. Getting / Inspecting a RAG Engine Corpus].
- Search for Contexts -> Proceed to [3. Retrieving Contexts].
- Answer questions using RAG Engine -> Proceed to [4. Answering the User with Retrieved Context].
[!TIP] Placeholder Parameter Replacement: The Python scripts below use bracketed string placeholders (like
"{project_id}","{region}", and"{corpus_id}"). You MUST dynamically replace these placeholders with the actual Project ID, Region, and Corpus ID values provided in the user's prompt (or active context) before generating, providing, or executing the scripts.
1. Listing Corpora and Files (Discovery)
If you do not know the Resource Name of the corpus or file, you MUST list them first to discover them. The SDK handles pagination automatically when converted to a list, but you can also use manual pagination for large sets.
1.1 Listing and Discovering Corpora
import vertexai
from vertexai.preview import rag
vertexai.init(project="{project_id}", location="{region}")
# Approach A: List ALL (Automatic Pagination)
# The SDK's Pager iterates through all pages for you.
all_corpora = list(rag.list_corpora())
print(f"Found {len(all_corpora)} corpora in total.")
for c in all_corpora:
print(f"Corpus Name: {c.name} | Display Name: {c.display_name}")
# Approach B: Manual Pagination (for very large projects)
pager = rag.list_corpora(page_size=10)
# Process first page
for c in pager:
print(f"Corpus: {c.display_name}")
# Get next page if needed
if pager.next_page_token:
second_page = rag.list_corpora(
page_size=10, page_token=pager.next_page_token
)
1.2 Listing and Discovering Files
To understand what files (and types) are in a corpus, list them and inspect the
display_name (usually includes the extension).
import vertexai
from vertexai.preview import rag
vertexai.init(project="{project_id}", location="{region}")
corpus_name = (
"projects/{project_id}/locations/{region}/ragCorpora/{corpus_id}"
)
# List files with automatic pagination
files = list(rag.list_files(corpus_name=corpus_name))
print(f"Found {len(files)} files.")
for f in files:
# High-level SDK RagFile objects usually have name, display_name,
# description
print(f"File: {f.display_name} | Resource: {f.name}")
# Tip: Check extension to understand file type (PDF, TXT, etc.)
if f.display_name.lower().endswith(".pdf"):
print(" Type: PDF")
elif f.display_name.lower().endswith(".txt"):
print(" Type: Plain Text")
2. Getting / Inspecting an Agent Platform RAG Engine Corpus
To retrieve details about an existing Agent Platform RAG Engine corpus:
import vertexai
from vertexai.preview import rag
vertexai.init(project="{project_id}", location="{region}")
# To get details of a specific corpus
corpus_name = (
"projects/{project_id}/locations/{region}/ragCorpora/{corpus_id}"
)
corpus = rag.get_corpus(name=corpus_name)
print(f"Corpus Name: {corpus.name}")
print(f"Display Name: {corpus.display_name}")
3. Retrieving Contexts
To retrieve relevant contexts from a RAG Engine corpus based on a query:
import vertexai
from vertexai.preview import rag
vertexai.init(project="{project_id}", location="{region}")
corpus_name = (
"projects/{project_id}/locations/{region}/ragCorpora/{corpus_id}"
)
query = "What is the speed of light?"
# Retrieve contexts
response = rag.retrieval_query(
rag_corpora=[corpus_name],
text=query,
similarity_top_k=3
)
for context in response.contexts.contexts:
print(f"Context text: {context.text}")
print(f"Source: {context.source_uri}")
4. Answering the User with Retrieved Context
To use the retrieved context alongside an Agent Platform model to generate a grounded response:
from google import genai
from google.genai import types
client = genai.Client(enterprise=True, project="{project_id}", location="{region}")
corpus_name = (
"projects/{project_id}/locations/{region}/ragCorpora/{corpus_id}"
)
# Define the Agent Platform RAG Engine tool pointing to the corpus
rag_tool = types.Tool(
retrieval=types.Retrieval(
vertex_rag_store=types.VertexRagStore(
rag_resources=[types.VertexRagStoreRagResource(rag_corpus=corpus_name)],
rag_retrieval_config=types.RagRetrievalConfig(
top_k=3,
filter=types.RagRetrievalConfigFilter(
vector_similarity_threshold=0.5,
),
),
)
)
)
# Generate content using the RAG Engine tool
response = client.models.generate_content(
model="gemini-2.5-flash",
contents="What is the speed of light?",
config=types.GenerateContentConfig(
tools=[rag_tool]
)
)
print(response.text)