BigQuery and ClickHouse: equivalent concepts
The tables below map each BigQuery concept to its ClickHouse equivalent — what to use instead, and where the model differs. For function-by-function SQL syntax mapping, see the BigQuery → ClickHouse SQL translation reference. For the end-to-end migration walkthrough, see Migrating from BigQuery to ClickHouse Cloud.
Resource hierarchy
How the platform organizes accounts, logical containers for data, and where compute is provisioned.
| BigQuery | ClickHouse | Notes |
|---|---|---|
| Organization | Organization | Root node of the hierarchy in both. |
| Project | Service (region-scoped); warehouse for grouping services with shared storage and independent compute | A ClickHouse service is one storage + one compute pool. Use a warehouse to group services that share storage but scale compute independently. |
| Dataset | Database | Logical container that organizes tables and scopes access. |
| Folder | Warehouse grouping, or separate services per workload | ClickHouse has no folder primitive — grouping is at the service / warehouse level. |
| IAM permissions | Console roles plus SQL grants | Two-layer access: roles in console.clickhouse.cloud plus SQL grants in the database. Console users can also be granted DB roles for SQL Console use. |
Compute, capacity, pricing
How processing is allocated to a query, sized, and billed.
| BigQuery | ClickHouse | Notes |
|---|---|---|
| Slot | Replica (whole node); queries parallelize across replicas | A replica is the unit of compute in a ClickHouse service; queries run across all replicas of the service. See the callout below for the granularity difference with BigQuery slots. |
| Slot reservation | Vertical and horizontal autoscaling bounds; warehouses for workload isolation | ClickHouse uses bounds-based autoscaling rather than guaranteed-capacity reservations. |
| Quotas | Workload classes plus per-query limits | Covers memory, CPU, concurrency, and I/O scheduling. The two quota models don't map row-by-row, but concept-level coverage is similar. |
| On-demand pricing (per TB scanned) | Compute-time (replica-hours) plus storage and transfer — see billing overview | The two pricing models are not directly comparable. |
| Logical vs physical storage billing | Compressed storage only — see billing overview | ClickHouse Cloud bills compressed storage. The logical-vs-physical distinction does not apply. |
A BigQuery slot is much finer-grained than a ClickHouse replica — it's closer to a CPU thread within a replica than to a whole replica. Both are the unit of compute being allocated to a query, but with very different sizing.
Storage and tables
How tables are stored: engines, schema, partitioning, snapshots, and access primitives.
In ClickHouse, a table's behavior is set at creation time: the engine (MergeTree family) determines merge and storage semantics, and ORDER BY / PARTITION BY / TTL clauses configure physical layout and retention. Many BigQuery per-feature settings map to a clause in the ClickHouse CREATE TABLE statement.
| BigQuery | ClickHouse | Notes |
|---|---|---|
| Table | MergeTree-family table | Engine choice determines storage and merge behavior — pick by access pattern (MergeTree for append-mostly facts, ReplacingMergeTree for upserts, AggregatingMergeTree for pre-aggregations). |
Column schema modes (NULLABLE, REQUIRED, REPEATED) | Nullable(T) for optional; omit for required; Array(T) for repeated; Array(Tuple(...)) or Nested for repeated records | In ClickHouse, columns are non-nullable unless wrapped with Nullable(T). Nullability has a small storage and query cost, so use it only when the column actually needs nulls. |
| Schema evolution (add / drop / modify columns) | ALTER TABLE ... ADD / DROP / MODIFY COLUMN | Same DDL surface as BigQuery. Many column changes are metadata-only. |
| Partitioning | PARTITION BY clause on the table | Partitions are defined at table creation; a partition expression determines how rows are grouped into parts on disk. |
| Clustering | ORDER BY columns in the table definition | Defined as part of the table; data is physically sorted on disk by the ORDER BY columns. |
| External tables / BigLake | s3 / gcs / azureBlobStorage table functions for direct file access; Iceberg engine for open catalogs | Object storage and open-table formats are read directly through these functions and engines. ClickHouse does not provide a unified-governance layer over external storage. |
| Object tables (SQL access to unstructured files) | s3 / gcs table functions over binary formats | ClickHouse treats unstructured-object access as a special case of external file reading via table functions, not as a dedicated table type. |
| Apache Iceberg | Iceberg engine (read-only) | Reads Iceberg tables stored in S3, Azure, HDFS, or local storage; writes are not supported. See the engine page for the current list of supported features. |
| Default table / partition / dataset expiration | TTL clause on the table, column, or partition | Both support automatic deletion of data older than a configured window. TTL can be set at table creation or via ALTER TABLE ... MODIFY TTL. |
| Table snapshot | Service-level backup | See callout below — granularity differs significantly. |
| Time travel | Point-in-time backup restore into a new service | Backups are service-scoped, not table-scoped, so the restore unit is the whole service rather than a single table at a moment in time. |
| Authorized views | View with SQL SECURITY DEFINER (runs with the view-owner's privileges) | See CREATE VIEW for the syntax and the INVOKER / DEFINER / NONE modes. |
| Row-level security | Row policy — a WHERE-style expression evaluated per user | Row policies apply transparently to every query against the table. |
Wildcard tables (_TABLE_SUFFIX) | Merge table engine (persistent grouping) or merge() function (inline) | Same idea, different syntax. Merge is a persistent table-of-tables; merge() is inline without creating one. |
| Table clone | CREATE TABLE ... AS SELECT copy, or backup restore into a new service | ClickHouse has no copy-on-write primitive — every copy reads the source data fully. |
ClickHouse Cloud backups are per-service. Restoring a backup creates a new service — a single table cannot be restored back into the original service. Plan accordingly if your current workflow relies on per-table snapshots.
Query model and performance
How queries run and are accelerated — indexes, materialized views, caches, and streaming inputs.
Query acceleration in ClickHouse comes from three layers: primary-key ordering (a sparse index over the on-disk sort order), secondary indexes on non-key columns, and materialized views — incremental or refreshable. The rows below map BigQuery's acceleration features onto these primitives.
| BigQuery | ClickHouse | Notes |
|---|---|---|
| Primary key (advisory) | Primary key — drives the on-disk sort order and the sparse primary index | Neither system enforces uniqueness; the optimizer uses the key to prune granules, avoid re-sorts, and short-circuit LIMIT. |
| Foreign key (advisory) | Wide tables or dictionaries for lookups | ClickHouse doesn't accept foreign-key declarations even as advisory hints. |
| Search index | Full-text index | Token index over string columns. |
| Vector index | Vector ANN index | Approximate nearest-neighbor lookups over embedding columns. |
| Materialized view | Incremental MV (updates on every insert) or refreshable MV (runs on a schedule) | ClickHouse supports two MV models — see callout. |
| Scheduled query | Refreshable MV — runs the query on a schedule and maintains its result table | Refreshable MVs replace the scheduled-query-into-target-table pattern. |
| Streaming inserts | Native INSERT over HTTP or the native protocol for direct ingest; ClickPipes for managed streaming | ClickPipes covers Kafka, Kinesis, Pub/Sub, MySQL, Postgres, and object storage. |
| Continuous queries | Streaming table engine (Kafka, Pub/Sub, etc.) feeding a materialized view that writes to a destination table | Same end-to-end model: ingest → transform → write. |
| Dry run | EXPLAIN ESTIMATE — reports rows, parts, and marks the query would read | Other EXPLAIN variants (PLAN, PIPELINE, SYNTAX) cover deeper plan inspection. |
| Federated queries (Spanner, Cloud SQL, AlloyDB) | External OLTP attached via database engine (PostgreSQL, MySQL, MongoDB, SQLite) | Distinct from external tables in object storage — these attach a live source so its tables are queryable directly. |
| Cached results | Query cache | Both transparently reuse results of recently executed queries. |
| Sessions / multi-statement queries | Per-statement execution; multi-step state managed in the client or an orchestrator | ClickHouse has no per-session variables or shared state. |
Secondary indexes
Indexes on non-primary-key columns, used when queries filter by columns outside the sort order:
- Bloom-filter — equality lookups (
=,IN) - Token-bloom — substring search on tokenized text
- Minmax — range pruning by per-part min/max
ClickHouse has two MV models: incremental MVs update on every base-table insert (cost proportional to the insert) and refreshable MVs run on a schedule. BigQuery materialized views correspond to the refreshable model. Use incremental for high-throughput aggregations, refreshable for periodic snapshots.
SQL and functions
The query-language surface: SQL coverage, UDFs, and the built-in function library.
ClickHouse SQL covers the standard SELECT / JOIN / GROUP BY / window-function surface. Function-by-function mapping (date, JSON, string, regex, window) lives in the BigQuery → ClickHouse SQL translation reference; the rows below are concept-level only.
| BigQuery | ClickHouse | Notes |
|---|---|---|
| Standard SQL | ClickHouse SQL — same SELECT / JOIN / GROUP BY, with lambdas and aggregate combinators as additional language features | Compatible at the level of basic SQL. Lambdas and combinators are the two extensions worth getting familiar with. |
| Aggregate functions | Aggregate functions composable with combinators (-Array, -Map, -ForEach, -If, …) | Combinators compose any aggregate with any input shape. |
Array functions, UNNEST | Array functions and lambdas | Common patterns: arrayFilter, arrayMap, arrayZip, arrayReduce. |
| SQL UDFs | CREATE FUNCTION (SQL expression) | Same model — function from a SQL expression. |
| JavaScript UDFs | Executable UDF shelling out to a Python, shell, or other script | Different language and execution model, similar role. |
| Stored procedures | Client-side or orchestrator-side procedural logic (dbt, Airflow) | ClickHouse has no procedural SQL. |
| Multi-statement transactions | Per-insert and per-DDL atomic guarantees; application-layer grouping for multi-write batches | Multi-statement transactions are on the roadmap. |
| Sketches (HLL, approximate quantiles) | uniqHLL12, quantileTDigest, quantileDDSketch, and others — composable via -State/-Merge combinators | Approximate aggregates that serialize as state and merge across queries. |
Security and governance
Access control, encryption, masking, and network boundaries.
Authorized views and row-level security are listed under Storage and tables.
| BigQuery | ClickHouse | Notes |
|---|---|---|
| Policy tags / column-level access control | Column-level grants on specific columns of a table | Grants apply at the column level. BigQuery's centralized taxonomy/policy-tag governance has no direct equivalent. |
| Data masking | Views, row policies, or function-based transforms — see data masking patterns | No column-mask primitive yet; patterns are SQL-level. |
| Customer-managed encryption keys (CMEK) | CMEK on the service | BYOK in AWS KMS, with rotation and revocation. |
| AEAD / SQL-level encryption functions | Encryption functions (encrypt / decrypt) | Covers AES-128/256-CBC/GCM and AEAD modes. |
| Differential privacy | External noise application, or via a UDF | No built-in differential privacy in ClickHouse. |
| VPC Service Controls | PrivateLink (AWS / Azure) and IP allowlists for ingress restriction | Boundary semantics are narrower than VPC SC. |
Data sharing
Cross-organization data exchange and clean-room patterns.
| BigQuery | ClickHouse | Notes |
|---|---|---|
| Analytics Hub / data exchanges / listings | Read access to a shared database, or a dedicated service with consumer-specific row policies | ClickHouse has no in-product data marketplace; sharing uses standard access primitives. |
| Data clean rooms | Row policies and authorized views — assembled per use case | No managed clean-room product. |
Operations and ecosystem
Day-2 concerns: ingestion, ML/BI integration, observability, metadata, and disaster recovery.
ClickHouse surfaces operational state through system.* tables (queries, sessions, replication, parts, metrics) and the cloud console; managed ingestion is handled by ClickPipes; ML, BI, and notebook workflows are typically handled in external systems that read from ClickHouse.
| BigQuery | ClickHouse | Notes |
|---|---|---|
| BigQuery ML | External training and serving (notebooks, Spark, Vertex AI, feature stores) reading from ClickHouse; see AI/ML in Cloud for managed-side features | ClickHouse has no in-database ML — the typical pattern is to use ClickHouse as the analytical store and run training elsewhere. |
| BI Engine | Direct querying — ClickHouse is a column-oriented analytical engine | ClickHouse has no separate caching layer to provision for BI workloads; queries run against the storage engine directly. |
| OMNI / cross-cloud federated query | One ClickHouse service per supported region where the data lives, with cross-region replication as needed | Pattern is one service per cloud, not federated queries across clouds. |
| Data sources / file formats | File-format and connector library | Managed connectors (ClickPipes) for sources like Kafka, Pub/Sub, MySQL, Postgres, and object storage; SQL table functions for ad-hoc reads of files in object storage. |
| Query jobs (ID, history, cancel) | system.query_log and system.processes for inspection; KILL QUERY to cancel | Same information, exposed through system tables instead of a job API. |
INFORMATION_SCHEMA | Native system.* tables for ClickHouse-specific detail, or the ANSI information_schema views for tool compatibility | Both surfaces available. |
| Data Transfer Service | ClickPipes — scheduled and streaming ingestion from SaaS, storage, and OLTP sources | Covers the same scheduling and source-coverage role. |
| Audit logs | Cloud audit log and system tables | Both systems log admin and query activity. |
| Change data capture ingestion | ClickPipes for Postgres, MySQL, or Kafka | Managed CDC from OLTP and streaming sources into ClickHouse tables. |
| BigQuery Studio notebooks / BigQuery DataFrames | Jupyter with clickhouse-connect or another client library | No in-product notebook environment or pandas-compatible in-DB API; notebook-side libraries cover the same workflow. |
| Data Canvas / managed data preparations | SQL Console and ClickPipes; visual data-prep in an external orchestrator | SQL Console is the UI counterpart; ClickPipes covers managed ingestion. |
| Gemini in BigQuery (SQL generation, code completion) | Ask-AI button in docs and console | LLM assistance is surfaced through Ask-AI; ClickHouse has no in-query assistant. |
| Knowledge Catalog / data lineage / data quality | system.* tables for metadata; external tools (dbt, DataHub) for lineage and quality | ClickHouse exposes metadata via system tables rather than a managed catalog product. |
| Cross-region replication / managed disaster recovery | Multi-AZ HA within a region (automatic); cross-region replication via Replicated*MergeTree engines or the Enterprise tier's advanced DR features | Multi-AZ HA is on by default within a region. Cross-region replication is configurable; latency between regions affects write performance. |