BigQuery and ClickHouse: equivalent concepts

The tables below map each BigQuery concept to its ClickHouse equivalent — what to use instead, and where the model differs. For function-by-function SQL syntax mapping, see the BigQuery → ClickHouse SQL translation reference. For the end-to-end migration walkthrough, see Migrating from BigQuery to ClickHouse Cloud.

Resource hierarchy

How the platform organizes accounts, logical containers for data, and where compute is provisioned.

BigQuery	ClickHouse	Notes
Organization	Organization	Root node of the hierarchy in both.
Project	Service (region-scoped); warehouse for grouping services with shared storage and independent compute	A ClickHouse service is one storage + one compute pool. Use a warehouse to group services that share storage but scale compute independently.
Dataset	Database	Logical container that organizes tables and scopes access.
Folder	Warehouse grouping, or separate services per workload	ClickHouse has no folder primitive — grouping is at the service / warehouse level.
IAM permissions	Console roles plus SQL grants	Two-layer access: roles in `console.clickhouse.cloud` plus SQL grants in the database. Console users can also be granted DB roles for SQL Console use.

Compute, capacity, pricing

How processing is allocated to a query, sized, and billed.

BigQuery	ClickHouse	Notes
Slot	Replica (whole node); queries parallelize across replicas	A replica is the unit of compute in a ClickHouse service; queries run across all replicas of the service. See the callout below for the granularity difference with BigQuery slots.
Slot reservation	Vertical and horizontal autoscaling bounds; warehouses for workload isolation	ClickHouse uses bounds-based autoscaling rather than guaranteed-capacity reservations.
Quotas	Workload classes plus per-query limits	Covers memory, CPU, concurrency, and I/O scheduling. The two quota models don't map row-by-row, but concept-level coverage is similar.
On-demand pricing (per TB scanned)	Compute-time (replica-hours) plus storage and transfer — see billing overview	The two pricing models are not directly comparable.
Logical vs physical storage billing	Compressed storage only — see billing overview	ClickHouse Cloud bills compressed storage. The logical-vs-physical distinction does not apply.

Slot vs replica

A BigQuery slot is much finer-grained than a ClickHouse replica — it's closer to a CPU thread within a replica than to a whole replica. Both are the unit of compute being allocated to a query, but with very different sizing.

Storage and tables

How tables are stored: engines, schema, partitioning, snapshots, and access primitives.

In ClickHouse, a table's behavior is set at creation time: the engine (MergeTree family) determines merge and storage semantics, and ORDER BY / PARTITION BY / TTL clauses configure physical layout and retention. Many BigQuery per-feature settings map to a clause in the ClickHouse CREATE TABLE statement.

BigQuery	ClickHouse	Notes
Table	MergeTree-family table	Engine choice determines storage and merge behavior — pick by access pattern (`MergeTree` for append-mostly facts, `ReplacingMergeTree` for upserts, `AggregatingMergeTree` for pre-aggregations).
Column schema modes (`NULLABLE`, `REQUIRED`, `REPEATED`)	`Nullable(T)` for optional; omit for required; `Array(T)` for repeated; `Array(Tuple(...))` or `Nested` for repeated records	In ClickHouse, columns are non-nullable unless wrapped with `Nullable(T)`. Nullability has a small storage and query cost, so use it only when the column actually needs nulls.
Schema evolution (add / drop / modify columns)	`ALTER TABLE ... ADD / DROP / MODIFY COLUMN`	Same DDL surface as BigQuery. Many column changes are metadata-only.
Partitioning	`PARTITION BY` clause on the table	Partitions are defined at table creation; a partition expression determines how rows are grouped into parts on disk.
Clustering	`ORDER BY` columns in the table definition	Defined as part of the table; data is physically sorted on disk by the `ORDER BY` columns.
External tables / BigLake	`s3` / `gcs` / `azureBlobStorage` table functions for direct file access; Iceberg engine for open catalogs	Object storage and open-table formats are read directly through these functions and engines. ClickHouse does not provide a unified-governance layer over external storage.
Object tables (SQL access to unstructured files)	`s3` / `gcs` table functions over binary formats	ClickHouse treats unstructured-object access as a special case of external file reading via table functions, not as a dedicated table type.
Apache Iceberg	Iceberg engine (read-only)	Reads Iceberg tables stored in S3, Azure, HDFS, or local storage; writes are not supported. See the engine page for the current list of supported features.
Default table / partition / dataset expiration	`TTL` clause on the table, column, or partition	Both support automatic deletion of data older than a configured window. `TTL` can be set at table creation or via `ALTER TABLE ... MODIFY TTL`.
Table snapshot	Service-level backup	See callout below — granularity differs significantly.
Time travel	Point-in-time backup restore into a new service	Backups are service-scoped, not table-scoped, so the restore unit is the whole service rather than a single table at a moment in time.
Authorized views	View with `SQL SECURITY DEFINER` (runs with the view-owner's privileges)	See CREATE VIEW for the syntax and the `INVOKER` / `DEFINER` / `NONE` modes.
Row-level security	Row policy — a `WHERE`-style expression evaluated per user	Row policies apply transparently to every query against the table.
Wildcard tables (`_TABLE_SUFFIX`)	`Merge` table engine (persistent grouping) or `merge()` function (inline)	Same idea, different syntax. `Merge` is a persistent table-of-tables; `merge()` is inline without creating one.
Table clone	`CREATE TABLE ... AS SELECT` copy, or backup restore into a new service	ClickHouse has no copy-on-write primitive — every copy reads the source data fully.

Backups

ClickHouse Cloud backups are per-service. Restoring a backup creates a new service — a single table cannot be restored back into the original service. Plan accordingly if your current workflow relies on per-table snapshots.

Query model and performance

How queries run and are accelerated — indexes, materialized views, caches, and streaming inputs.

Query acceleration in ClickHouse comes from three layers: primary-key ordering (a sparse index over the on-disk sort order), secondary indexes on non-key columns, and materialized views — incremental or refreshable. The rows below map BigQuery's acceleration features onto these primitives.

BigQuery	ClickHouse	Notes
Primary key (advisory)	Primary key — drives the on-disk sort order and the sparse primary index	Neither system enforces uniqueness; the optimizer uses the key to prune granules, avoid re-sorts, and short-circuit `LIMIT`.
Foreign key (advisory)	Wide tables or dictionaries for lookups	ClickHouse doesn't accept foreign-key declarations even as advisory hints.
Search index	Full-text index	Token index over string columns.
Vector index	Vector ANN index	Approximate nearest-neighbor lookups over embedding columns.
Materialized view	Incremental MV (updates on every insert) or refreshable MV (runs on a schedule)	ClickHouse supports two MV models — see callout.
Scheduled query	Refreshable MV — runs the query on a schedule and maintains its result table	Refreshable MVs replace the scheduled-query-into-target-table pattern.
Streaming inserts	Native `INSERT` over HTTP or the native protocol for direct ingest; ClickPipes for managed streaming	ClickPipes covers Kafka, Kinesis, Pub/Sub, MySQL, Postgres, and object storage.
Continuous queries	Streaming table engine (Kafka, Pub/Sub, etc.) feeding a materialized view that writes to a destination table	Same end-to-end model: ingest → transform → write.
Dry run	`EXPLAIN ESTIMATE` — reports rows, parts, and marks the query would read	Other `EXPLAIN` variants (`PLAN`, `PIPELINE`, `SYNTAX`) cover deeper plan inspection.
Federated queries (Spanner, Cloud SQL, AlloyDB)	External OLTP attached via database engine (PostgreSQL, MySQL, MongoDB, SQLite)	Distinct from external tables in object storage — these attach a live source so its tables are queryable directly.
Cached results	Query cache	Both transparently reuse results of recently executed queries.
Sessions / multi-statement queries	Per-statement execution; multi-step state managed in the client or an orchestrator	ClickHouse has no per-session variables or shared state.

Secondary indexes

Indexes on non-primary-key columns, used when queries filter by columns outside the sort order:

Bloom-filter — equality lookups (=, IN)
Token-bloom — substring search on tokenized text
Minmax — range pruning by per-part min/max

Materialized view update model

ClickHouse has two MV models: incremental MVs update on every base-table insert (cost proportional to the insert) and refreshable MVs run on a schedule. BigQuery materialized views correspond to the refreshable model. Use incremental for high-throughput aggregations, refreshable for periodic snapshots.

SQL and functions

The query-language surface: SQL coverage, UDFs, and the built-in function library.

ClickHouse SQL covers the standard SELECT / JOIN / GROUP BY / window-function surface. Function-by-function mapping (date, JSON, string, regex, window) lives in the BigQuery → ClickHouse SQL translation reference; the rows below are concept-level only.

BigQuery	ClickHouse	Notes
Standard SQL	ClickHouse SQL — same `SELECT` / `JOIN` / `GROUP BY`, with lambdas and aggregate combinators as additional language features	Compatible at the level of basic SQL. Lambdas and combinators are the two extensions worth getting familiar with.
Aggregate functions	Aggregate functions composable with combinators (`-Array`, `-Map`, `-ForEach`, `-If`, …)	Combinators compose any aggregate with any input shape.
Array functions, `UNNEST`	Array functions and lambdas	Common patterns: `arrayFilter`, `arrayMap`, `arrayZip`, `arrayReduce`.
SQL UDFs	`CREATE FUNCTION` (SQL expression)	Same model — function from a SQL expression.
JavaScript UDFs	Executable UDF shelling out to a Python, shell, or other script	Different language and execution model, similar role.
Stored procedures	Client-side or orchestrator-side procedural logic (dbt, Airflow)	ClickHouse has no procedural SQL.
Multi-statement transactions	Per-insert and per-DDL atomic guarantees; application-layer grouping for multi-write batches	Multi-statement transactions are on the roadmap.
Sketches (HLL, approximate quantiles)	`uniqHLL12`, `quantileTDigest`, `quantileDDSketch`, and others — composable via `-State`/`-Merge` combinators	Approximate aggregates that serialize as state and merge across queries.

Security and governance

Access control, encryption, masking, and network boundaries.

Authorized views and row-level security are listed under Storage and tables.

BigQuery	ClickHouse	Notes
Policy tags / column-level access control	Column-level grants on specific columns of a table	Grants apply at the column level. BigQuery's centralized taxonomy/policy-tag governance has no direct equivalent.
Data masking	Views, row policies, or function-based transforms — see data masking patterns	No column-mask primitive yet; patterns are SQL-level.
Customer-managed encryption keys (CMEK)	CMEK on the service	BYOK in AWS KMS, with rotation and revocation.
AEAD / SQL-level encryption functions	Encryption functions (`encrypt` / `decrypt`)	Covers AES-128/256-CBC/GCM and AEAD modes.
Differential privacy	External noise application, or via a UDF	No built-in differential privacy in ClickHouse.
VPC Service Controls	PrivateLink (AWS / Azure) and IP allowlists for ingress restriction	Boundary semantics are narrower than VPC SC.

Cross-organization data exchange and clean-room patterns.

BigQuery	ClickHouse	Notes
Analytics Hub / data exchanges / listings	Read access to a shared database, or a dedicated service with consumer-specific row policies	ClickHouse has no in-product data marketplace; sharing uses standard access primitives.
Data clean rooms	Row policies and authorized views — assembled per use case	No managed clean-room product.

Operations and ecosystem

Day-2 concerns: ingestion, ML/BI integration, observability, metadata, and disaster recovery.

ClickHouse surfaces operational state through system.* tables (queries, sessions, replication, parts, metrics) and the cloud console; managed ingestion is handled by ClickPipes; ML, BI, and notebook workflows are typically handled in external systems that read from ClickHouse.

BigQuery	ClickHouse	Notes
BigQuery ML	External training and serving (notebooks, Spark, Vertex AI, feature stores) reading from ClickHouse; see AI/ML in Cloud for managed-side features	ClickHouse has no in-database ML — the typical pattern is to use ClickHouse as the analytical store and run training elsewhere.
BI Engine	Direct querying — ClickHouse is a column-oriented analytical engine	ClickHouse has no separate caching layer to provision for BI workloads; queries run against the storage engine directly.
OMNI / cross-cloud federated query	One ClickHouse service per supported region where the data lives, with cross-region replication as needed	Pattern is one service per cloud, not federated queries across clouds.
Data sources / file formats	File-format and connector library	Managed connectors (ClickPipes) for sources like Kafka, Pub/Sub, MySQL, Postgres, and object storage; SQL table functions for ad-hoc reads of files in object storage.
Query jobs (ID, history, cancel)	`system.query_log` and `system.processes` for inspection; `KILL QUERY` to cancel	Same information, exposed through system tables instead of a job API.
`INFORMATION_SCHEMA`	Native `system.*` tables for ClickHouse-specific detail, or the ANSI `information_schema` views for tool compatibility	Both surfaces available.
Data Transfer Service	ClickPipes — scheduled and streaming ingestion from SaaS, storage, and OLTP sources	Covers the same scheduling and source-coverage role.
Audit logs	Cloud audit log and system tables	Both systems log admin and query activity.
Change data capture ingestion	ClickPipes for Postgres, MySQL, or Kafka	Managed CDC from OLTP and streaming sources into ClickHouse tables.
BigQuery Studio notebooks / BigQuery DataFrames	Jupyter with `clickhouse-connect` or another client library	No in-product notebook environment or pandas-compatible in-DB API; notebook-side libraries cover the same workflow.
Data Canvas / managed data preparations	SQL Console and ClickPipes; visual data-prep in an external orchestrator	SQL Console is the UI counterpart; ClickPipes covers managed ingestion.
Gemini in BigQuery (SQL generation, code completion)	Ask-AI button in docs and console	LLM assistance is surfaced through Ask-AI; ClickHouse has no in-query assistant.
Knowledge Catalog / data lineage / data quality	`system.*` tables for metadata; external tools (dbt, DataHub) for lineage and quality	ClickHouse exposes metadata via system tables rather than a managed catalog product.
Cross-region replication / managed disaster recovery	Multi-AZ HA within a region (automatic); cross-region replication via `Replicated*MergeTree` engines or the Enterprise tier's advanced DR features	Multi-AZ HA is on by default within a region. Cross-region replication is configurable; latency between regions affects write performance.

Resource hierarchy​

Compute, capacity, pricing​

Storage and tables​

Query model and performance​

Secondary indexes​

SQL and functions​

Security and governance​

Data sharing​

Operations and ecosystem​