Deletion Vectors
Deletion Vectors in Delta Lake¶
🔹 What are Deletion Vectors?¶
Normally, when you delete rows in a Delta table, Delta rewrites entire Parquet files without those rows.
This is called copy-on-write → expensive for big tables.
Deletion Vectors (DVs) are a new optimization:
Instead of rewriting files, Delta just marks the deleted rows with a bitmap (a lightweight “mask”). The data is still physically there, but readers skip the “deleted” rows.
Think of it like putting a red X mark ❌ on rows instead of erasing them immediately.
🔹 Why are they useful?¶
🚀 Much faster deletes/updates/merges (because files aren’t rewritten).
⚡ Less I/O → good for big data tables.
✅ Efficient for streaming + time travel.
Example Without deletion vectors¶
- Create a sales table
CREATE TABLE dev.bronze.sales as
select * from
read_files(
'dbfs:/databricks-datasets/online_retail/data-001/data.csv',
header => true,
format => 'csv'
)
- Set Deletion Vectors false
- Delete some rows
- Describe history
Observe that all rows (65000+) are removed and rewritten.
Example with deletion vectors¶
We can see that one deletion vector is added no files are rewritten.
Running optimize would remove those files / records.