Analyzing data distribution
Maintain schema health and ensure data is distributed efficiently across the cluster by auditing database structures and diagnosing storage bottlenecks. Use the Data Analysis panel on the left sidebar to monitor table growth, identify bloat, and resolve data skew.
Optimizing storage and schema health
Monitor the scale of your relational data and verify that your storage settings are optimized for analytical workloads with the Tables tab.
- To validate compression efficiency, observe the Compression and Level columns. If the compression ratio is low for a large table, consider adjusting the algorithm (e.g., switching to
zstandard) to reclaim disk space. - Check metadata freshness by monitoring the Last Analyze timestamp. If a table hasn't been analyzed recently, the query planner might use stale statistics, leading to inefficient execution plans.
- Manage external data services with the External Tables tab, which provides an overview of the data residing in S3 or HDFS.
Improving performance and storage efficiency
Ensure your data structures support rapid query execution and simplified data lifecycles. Use the Indexes and Partitions tabs .
- Update stale statistics by navigating to the Missing Stats and identify tables that haven't been analyzed in over seven days.
Tip
Statistics help the query planner make optimal decisions. Manually run ANALYZE after any operation that modifies more than 10% of a table's data to ensure optimal query plans.
- Reclaim wasted disk space by reviewing the Bloat tab to find tables with a high dead row count. If bloat exceeds 20%, run a manual
VACUUMto mark the dead space for reuse and reduce the physical size of the table. - Resolve data distribution imbalance by looking at the Data Skew tab to find tables where data is unevenly spread across segments. A high skew percentage indicates that a single segment is doing more work than others, slowing down the entire cluster. If a large table shows significant skew, investigate the distribution key. Consider using
ALTER TABLE ... SET DISTRIBUTED BYto choose a column with higher cardinality (more unique values) or fewer nulls.
Planning long-term storage and capacity
Analyze the physical composition of your database and identify long-term storage trends with the Charts tab.
- Prioritize archival candidates by review the Top 50 Tables by Size bar chart to see which objects are candidates for partitioning or data archiving.
- Audit storage formats by looking at the Storage Format Distribution pie chart. If a majority of your data is in Heap format, plan a migration to append-only storage to optimize for high-volume analytical reads.
Could this page be better? Report a problem or suggest an addition!