Impala TPC-DS Run Comparison

Interactive comparison across three suite runs with per-query and parallel timings. Use the controls to switch modes, filter queries, and inspect SQL + sample result rows.

AWS DataHub (S3-backed)

Cloud environment running Cloudera DataHubs on AWS with data stored in S3.

STACKIT Local Disks (Ozone)

STACKIT environment using local disk-backed storage (Ozone profile).

STACKIT S3

STACKIT environment using STACKIT's S3 object storage profile.

Infrastructure and Methodology

The three benchmark variants are designed to compare storage and platform behavior while keeping Impala execution limits as comparable as possible.

Impala Compute Profile on STACKIT

Impala runs in CDW on Kubernetes to decouple executor placement from data locality.
10 executor nodes, 45 GB memory limit per executor, 6 vCPU pods.
Executor pods run on ECS worker nodes.
For Ozone runs, benchmark data is hosted on Ozone running on the base cluster's 4 worker nodes.

Storage Variants

Ozone run: Ozone with 3x replication and filesystem-optimized layout.
Block storage profile: ~100 MB/s class block devices (enterprise HDD approximation).
STACKIT S3 run: same compute setup as Ozone, data placed on STACKIT S3.

AWS DataHub (Data Mart)

10 Impala executors with 45 GB memory limit per executor.
Data stored in S3.
Used as cross-platform comparison against STACKIT runs.

Common Impala Tuning

NUM_SCANNER_THREADS=4 to normalize scan-side CPU pressure across platforms.
MT_DOP=4 to keep intra-node parallelism aligned.
Table statistics precomputed so catalog/coordinator metadata stays warm before query execution.