Storage
Data Storage Internals, Checkpoints and Renaming Streaming Tables and Views¶
Here’s a detailed explanation of how Delta Live Tables (DLT) handles storage for streaming tables, pipeline dependency, and renaming behavior:
1. Where is data for streaming tables stored?¶
- Streaming tables in DLT are stored as Delta tables on the Databricks File System (DBFS) or your cloud storage configured for the pipeline, e.g., S3, ADLS Gen2, or GCS.
- Every streaming table has a physical Delta table location, even though you define it declaratively in DLT.
- The storage location is typically managed by the DLT pipeline, but you can explicitly configure it in advanced pipeline settings.
Key points:
- Incremental state (processed offsets, checkpoints) is stored in _system-managed checkpoints within the pipeline’s storage path.
- Upserts and merges are persisted in the Delta table itself.
- Data retention and compaction follow normal Delta table rules.
2. Is storage dependent on the pipeline?¶
Yes, partially:
- Pipeline-specific storage: Each DLT pipeline manages its own metadata and checkpoints for the streaming tables it owns.
- Shared tables:
If multiple pipelines reference the same Delta table (e.g., using
LIVE.<table_name>
), the physical Delta table is shared, but each pipeline maintains its own lineage and state metadata.
Implication:
- Deleting a pipeline does not delete the underlying Delta table automatically, unless you explicitly choose managed tables.
- Changing pipelines (like moving a table to a different pipeline) requires careful handling to avoid breaking downstream dependencies.
3. What happens when we rename streaming tables?¶
- DLT does not support a “rename” operation in place for streaming tables.
-
If you rename a table in DLT:
-
The new table name points to a new managed object in the pipeline.
- The underlying Delta data is copied or remapped depending on configuration.
- Any downstream references (
LIVE.<old_name>
) break unless you update them to the new name. -
Best practice:
-
Avoid renaming streaming tables in active pipelines.
- If renaming is needed, create a new table with the desired name and point downstream materialized views or pipelines to it.
4. Practical Notes / Recommendations¶
Aspect | Recommendation |
---|---|
Storage | Let DLT manage default Delta table locations unless you need a custom path. |
Pipeline dependency | Be aware that streaming tables are tied to pipeline metadata (checkpoints, lineage). |
Renaming | Prefer creating a new table and updating downstream references; avoid in-place renames for live pipelines. |
Backup | If renaming or moving, snapshot or backup Delta tables to avoid data loss. |
✅ Summary:
- Streaming tables always persist data as Delta tables in the pipeline’s storage.
- Storage and checkpoints are pipeline-dependent, but the data itself can be shared.
- Renaming a streaming table breaks dependencies; best approach is to create a new table instead of renaming.