Generates Python code using BigQuery DataFrames (BigFrames), the pandas/scikit-learn-style API over BigQuery. Use when writing BigFrames code or doing pandas-style dataframe/ML work against BigQuery (e.g. in a notebook). Don't use for SQL-first workflows or the google-cloud-bigquery client library — use bigquery-basics.
global
category:BigDataAndAnalytics
New~844
v1.0Saved Jun 30, 2026
BigFrames Development Standards
Avoid .to_pandas(): You MUST NOT use .to_pandas() to download the
entire dataset into memory as this downloads all data to the client's
memory, bypassing BigQuery's distributed computation and risking Out of
Memory (OOM) errors. There are some exceptions:
An error message explicitly requests you to use to_pandas()
You are going to visualize the data, and the visualization library does not accept BigFrames Dataframe/Series instances. In this case, reduce the amount of data you are going to download before calling .to_pandas()
Avoid read_gbq() for SQL: Do not write SQL queries and execute them
with read_gbq() to maintain the Pandas-like DataFrame abstraction and
allow lazy executions. Use BigFrames Dataframe/Series methods instead.
Use BigFrames ML package for Machine Learning Tasks: Do not use
Scikit-learn or other ML libraries with BigFrames dataframes because
standard Scikit-learn models require bringing data into local client memory,
whereas bigframes.ml delegates training directly to BigQuery's scalable ML
engine. Import your tools/classes from bigframes.ml.
Stay in the Cloud: Perform data cleaning, transformation, and analysis via BigFrames methods to leverage BigQuery's scale.
Accessors over UDFs/Lambdas:
Prefer built-in accessors (e.g., df.col.str.*, df.col.dt.*) over remote UDFs.
Do not use lambdas with Series.map() or DataFrame.apply().
Schema Verification: Do not assume schema of intermediate outputs. Check .dtypes after loading, and use display() with .head() or .peek().
Visualization: BigFrames Dataframe mostly works directly with
Matplotlib, Seaborn, and other plotting libraries. If your attempt didn't
work, try using the plot accessor. If that didn't work either, you MUST
sample or aggregate your data to make it small enough before calling
to_pandas().
Model Development
Unlike Scikit-learn: BigFrames' predict() method always returns a DataFrame containing both predictions and features (not just a series of predictions).
No random_state: Do not pass a random_state argument when instantiating BigFrames ML models, because this parameter is not supported in the BigFrames ML package.
Automatic Scaling: Do not use OneHotEncoder or StandardScaler unless explicitly requested (handled automatically).
Hyperparameter Tuning: You must write custom loops (BigFrames lacks GridSearchCV or RandomizedSearchCV).
ARIMA Plus (Forecasting):
Import from bigframes.ml.forecasting.
Sort data chronologically and split around a timepoint before training.
Prediction horizon must be less than or equal to training horizon.
PCA: BigFrames' PCA class lacks simple transform() method. Use predict() instead.
Model Persistence: To persist a model, use model.to_gbq(). To load a persisted model, use bpd.read_gbq_model().
Files1
1 files · 11.1 KB
Select a file to preview
Overall Score
76/100
Grade
B
Good
Safety
88
Quality
78
Clarity
82
Completeness
62
Summary
This skill provides development standards and best practices for writing Python code using BigQuery DataFrames (BigFrames), the pandas/scikit-learn-style API for BigQuery. It guides agents on when to use BigFrames (pandas-style dataframe/ML workflows) versus alternatives, and documents key patterns for data processing, ML model development, and visualization while maintaining efficiency in distributed BigQuery computation.
Detected Capabilities
code generationpython library guidancebigquery api knowledgeml model documentation
Trigger Keywords
Phrases that MCP clients use to match this skill to user intent.
bigframes developmentbigquery dataframe apibigframes ml modelspandas style bigquerybigframes forecasting
Referenced Domains
External domains referenced in skill content, detected by static analysis.
www.apache.org
Use Cases
Write BigFrames code for pandas-style data analysis on BigQuery
Develop ML models using BigFrames.ml instead of scikit-learn
Transform and clean large datasets without pulling them into client memory
Visualize BigQuery data efficiently using BigFrames with plotting libraries
Implement forecasting models with ARIMA Plus via BigFrames
Persist and load ML models using BigFrames to_gbq() and read_gbq_model()
Quality Notes
Clear scope: skill explicitly distinguishes BigFrames from SQL-first workflows (bigquery-basics) and scikit-learn
Well-structured with logical sections: general standards, model development specifics
Practical constraints documented: avoid .to_pandas() with exceptions clearly stated, no random_state for BigFrames models, PCA lacks simple transform()
Strong anti-patterns guidance: prevents common mistakes like using read_gbq() for SQL instead of DataFrame methods, using lambdas with Series.map()
Comprehensive ML coverage: explains BigFrames.ml differences from scikit-learn, ARIMA Plus specifics, hyperparameter tuning limitations
Schema verification best practice documented (.dtypes, .head(), .peek())
Visualization guidance includes fallback approach (use plot accessor, then sample/aggregate before to_pandas())
Accessor preferences documented (str.*, dt.*) with rationale
Model: claude-haiku-4-5-20251001Analyzed: Jun 30, 2026
Reviews
Add this skill to your library to leave a review.
No reviews yet
Be the first to share your experience.
Add google/bigquery-bigframes to your library and activate them in your dev environment via the SkillRepo CLI.