Catalog
google/bigquery-bigframes

google

bigquery-bigframes

Generates Python code using BigQuery DataFrames (BigFrames), the pandas/scikit-learn-style API over BigQuery. Use when writing BigFrames code or doing pandas-style dataframe/ML work against BigQuery (e.g. in a notebook). Don't use for SQL-first workflows or the google-cloud-bigquery client library — use bigquery-basics.

global
category:BigDataAndAnalytics
New~844
v1.0Saved Jun 30, 2026

BigFrames Development Standards

  • Avoid .to_pandas(): You MUST NOT use .to_pandas() to download the entire dataset into memory as this downloads all data to the client's memory, bypassing BigQuery's distributed computation and risking Out of Memory (OOM) errors. There are some exceptions:
    • An error message explicitly requests you to use to_pandas()
    • You are going to visualize the data, and the visualization library does not accept BigFrames Dataframe/Series instances. In this case, reduce the amount of data you are going to download before calling .to_pandas()
  • Avoid read_gbq() for SQL: Do not write SQL queries and execute them with read_gbq() to maintain the Pandas-like DataFrame abstraction and allow lazy executions. Use BigFrames Dataframe/Series methods instead.
  • Use BigFrames ML package for Machine Learning Tasks: Do not use Scikit-learn or other ML libraries with BigFrames dataframes because standard Scikit-learn models require bringing data into local client memory, whereas bigframes.ml delegates training directly to BigQuery's scalable ML engine. Import your tools/classes from bigframes.ml.
  • Stay in the Cloud: Perform data cleaning, transformation, and analysis via BigFrames methods to leverage BigQuery's scale.
  • Accessors over UDFs/Lambdas:
    • Prefer built-in accessors (e.g., df.col.str.*, df.col.dt.*) over remote UDFs.
    • Do not use lambdas with Series.map() or DataFrame.apply().
  • Schema Verification: Do not assume schema of intermediate outputs. Check .dtypes after loading, and use display() with .head() or .peek().
  • Visualization: BigFrames Dataframe mostly works directly with Matplotlib, Seaborn, and other plotting libraries. If your attempt didn't work, try using the plot accessor. If that didn't work either, you MUST sample or aggregate your data to make it small enough before calling to_pandas().

Model Development

  • Unlike Scikit-learn: BigFrames' predict() method always returns a DataFrame containing both predictions and features (not just a series of predictions).
  • No random_state: Do not pass a random_state argument when instantiating BigFrames ML models, because this parameter is not supported in the BigFrames ML package.
  • Automatic Scaling: Do not use OneHotEncoder or StandardScaler unless explicitly requested (handled automatically).
  • Hyperparameter Tuning: You must write custom loops (BigFrames lacks GridSearchCV or RandomizedSearchCV).
  • ARIMA Plus (Forecasting):
    • Import from bigframes.ml.forecasting.
    • Sort data chronologically and split around a timepoint before training.
    • Prediction horizon must be less than or equal to training horizon.
  • PCA: BigFrames' PCA class lacks simple transform() method. Use predict() instead.
  • Model Persistence: To persist a model, use model.to_gbq(). To load a persisted model, use bpd.read_gbq_model().
Files1
1 files · 11.1 KB

Select a file to preview

Overall Score

76/100

Grade

B

Good

Safety

88

Quality

78

Clarity

82

Completeness

62

Summary

This skill provides development standards and best practices for writing Python code using BigQuery DataFrames (BigFrames), the pandas/scikit-learn-style API for BigQuery. It guides agents on when to use BigFrames (pandas-style dataframe/ML workflows) versus alternatives, and documents key patterns for data processing, ML model development, and visualization while maintaining efficiency in distributed BigQuery computation.

Detected Capabilities

code generationpython library guidancebigquery api knowledgeml model documentation

Trigger Keywords

Phrases that MCP clients use to match this skill to user intent.

bigframes developmentbigquery dataframe apibigframes ml modelspandas style bigquerybigframes forecasting

Referenced Domains

External domains referenced in skill content, detected by static analysis.

www.apache.org

Use Cases

  • Write BigFrames code for pandas-style data analysis on BigQuery
  • Develop ML models using BigFrames.ml instead of scikit-learn
  • Transform and clean large datasets without pulling them into client memory
  • Visualize BigQuery data efficiently using BigFrames with plotting libraries
  • Implement forecasting models with ARIMA Plus via BigFrames
  • Persist and load ML models using BigFrames to_gbq() and read_gbq_model()

Quality Notes

  • Clear scope: skill explicitly distinguishes BigFrames from SQL-first workflows (bigquery-basics) and scikit-learn
  • Well-structured with logical sections: general standards, model development specifics
  • Practical constraints documented: avoid .to_pandas() with exceptions clearly stated, no random_state for BigFrames models, PCA lacks simple transform()
  • Strong anti-patterns guidance: prevents common mistakes like using read_gbq() for SQL instead of DataFrame methods, using lambdas with Series.map()
  • Comprehensive ML coverage: explains BigFrames.ml differences from scikit-learn, ARIMA Plus specifics, hyperparameter tuning limitations
  • Schema verification best practice documented (.dtypes, .head(), .peek())
  • Visualization guidance includes fallback approach (use plot accessor, then sample/aggregate before to_pandas())
  • Accessor preferences documented (str.*, dt.*) with rationale
Model: claude-haiku-4-5-20251001Analyzed: Jun 30, 2026

Reviews

Add this skill to your library to leave a review.

No reviews yet

Be the first to share your experience.

Add google/bigquery-bigframes to your library

Command Palette

Search for a command to run...