Impala TPC-DS Run Comparison
Interactive comparison across three suite runs with per-query and parallel timings.
Use the controls to switch modes, filter queries, and inspect SQL + sample result rows.
AWS DataHub (S3-backed)
Cloud environment running Cloudera DataHubs on AWS with data stored in S3.
STACKIT Local Disks (Ozone)
STACKIT environment using local disk-backed storage (Ozone profile).
STACKIT S3
STACKIT environment using STACKIT's S3 object storage profile.
Infrastructure and Methodology
The three benchmark variants are designed to compare storage and platform behavior while keeping Impala execution
limits as comparable as possible.
Impala Compute Profile on STACKIT
- Impala runs in CDW on Kubernetes to decouple executor placement from data locality.
- 10 executor nodes, 45 GB memory limit per executor, 6 vCPU pods.
- Executor pods run on ECS worker nodes.
- For Ozone runs, benchmark data is hosted on Ozone running on the base cluster's 4 worker nodes.
Storage Variants
- Ozone run: Ozone with 3x replication and filesystem-optimized layout.
- Block storage profile: ~100 MB/s class block devices (enterprise HDD approximation).
- STACKIT S3 run: same compute setup as Ozone, data placed on STACKIT S3.
AWS DataHub (Data Mart)
- 10 Impala executors with 45 GB memory limit per executor.
- Data stored in S3.
- Used as cross-platform comparison against STACKIT runs.
Common Impala Tuning
NUM_SCANNER_THREADS=4 to normalize scan-side CPU pressure across platforms.
MT_DOP=4 to keep intra-node parallelism aligned.
- Table statistics precomputed so catalog/coordinator metadata stays warm before query execution.
Per-query runtime comparison
Hover a bar for rows/sample output. Click a query to load SQL and detailed run info below.
⟵⟶ Pan horizontally to explore more queries
Selected query details
Click a query bar to inspect SQL text, row counts, and up to 10 sample result rows.
Aggregated runtime by environment
Simple total runtime comparison across environments for the selected mode.