Bigtable Basics
This skill provides core workflows and guidance for administering and developing with Google Bigtable.
Core Principles
- Control Plane vs. Data Plane:
- Use
gcloudfor Control Plane operations: Manage Instances, Clusters, App Profiles, Backups and IAM. Create Tables, Logical Views, Materialized Views and Authorized Views. - Use
cbtfor Data Plane operations: Update Tables, Column Families, and reading/writing data.
- Use
- Performance First: Bigtable is a NoSQL database. Efficiency is tied to Row Key design. Always warn about Full Table Scans.
- Client Selection: For production use cases, prefer Java or Go for their superior performance and feature coverage compared to other languages.
- Observability: When diagnosing performance or hotspotting, always
mention Key Visualizer (via Cloud Console) as the primary diagnostic
tool because it provides the most granular view of access patterns across
row keys. This should be followed by the hot-tablets tool and table stats
in gcloud CLI and
include-stats=fulloption undercbt readto diagnose slow queries.
[!IMPORTANT] Safety Rule: You MUST obtain explicit user confirmation before making non-emulator database changes. You MUST mention this safety requirement when providing commands or instructions that modify the database structure or data.
Quick Recipes
1. Querying Data
Use SQL for complex transforms or aggregations and key-value APIs for simpler
query patterns. Note: Use exact match, prefix (_key LIKE 'myprefix%'), or
range predicates on _key to avoid expensive unbounded scans. Recommend
explicit row ranges (_key BETWEEN 'start' AND 'end') as a more performant
alternative to prefix matches where possible.
If expensive scans (either unbounded or prefix or range queries scanning a large range) are unavoidable due to multiple access patterns that can’t all be accommodated in a single schema, consider one of these two options:
- If the query will be used in user facing and/or latency sensitive applications, use continuous materialized views with keys optimized for the additional access patterns.
- If secondary access patterns are infrequent, batch patterns like ETL, ML model training or analytical read-only tasks, use Bigtable Data Boost instead.
2. Manipulating Data
Use key-value APIs for insert, update, increment and delete operations. SQL API is read-only.
3. Data Model Definition (DDL)
SQL API doesn't support DDL operations. Table creation, deletion, updates should be made using gcloud CLI. Logical Views and Continuous Materialized Views are defined as SQL queries but they must be created using gcloud CLI.
Reference Guides
- CLI Operations:
- infrastructure_management.md: Provisioning instances, clusters, and table schemas.
- cli_data_access.md: Reading and writing
data via the
cbtCLI.
- Design & Discovery:
- schema_design.md: Best practices for row keys and performance with tables and continuous materialized views.
- dataplex.md: Data catalog search for Bigtable assets.
- Querying & Code:
- sql_guide.md: Querying structured row keys via SQL and CLI.
- client_libraries.md: Patterns for high-performance Go/Java/Python code.
Common Workflows
Schema Evolution (DevOps)
-
Prefer Terraform for production schema changes to prevent accidental data loss.
-
For manual
cbtchanges, first check the existing state by listing the table's column families and GC policies before proposing any modifications:cbt ls {table}If modifications are needed, create the family or update the GC policy:
cbt createfamily {table} {family} cbt setgcpolicy {table} {family} "maxversions=5 AND maxage=30d" -
Reference infrastructure_management.md for full syntax.