Overview of Azure Storage¶
🔹 What is Azure Storage?¶
Azure Storage is Microsoft’s cloud-based storage platform, providing highly available, durable, and secure storage for blobs, files, queues, and tables.
Think of it like a giant hard drive in the cloud, but with specialized “drawers” for different types of data.
🔹 Core Types of Azure Storage¶
-
Blob Storage (Data Lake Gen2)
-
Stores unstructured data: text, images, video, JSON, Parquet, CSV.
- Supports HDFS-like hierarchical namespace (HNS) when enabled → required for Data Lake Gen2.
-
Used for:
- Data lakes (ETL, big data, analytics).
- Storing files for machine learning.
- Backups, archives.
- Example:
abfss://container@account.dfs.core.windows.net/
-
File Storage (Azure Files)
-
Fully managed file shares accessible via SMB/NFS.
- Used when apps expect a network file share.
-
Example use cases:
- Lift-and-shift legacy apps that require shared drives.
- Store config files for apps running in Azure VMs or Kubernetes.
-
Queue Storage
-
Stores messages that applications can send and receive asynchronously.
- Each message up to 64 KB.
- Used for decoupling applications (producer/consumer).
- Example: A web app puts a message in a queue, and a background worker picks it up for processing.
-
Table Storage (or Cosmos DB Table API)
-
NoSQL key-value store.
- Stores structured, non-relational data.
- Example: IoT telemetry, user profiles, metadata.
🔹 Storage Account Types¶
When you create a Storage Account in Azure, you’re creating a top-level container for these services.
Types:
- General-purpose v2 (GPv2) → recommended, supports blobs, files, queues, tables, Data Lake Gen2.
- Blob storage account → specialized for blobs.
- Premium storage → optimized for low-latency scenarios (SSD-backed).
🔹 Key Features¶
-
Redundancy (Replication):
-
LRS (Locally redundant).
- ZRS (Zone redundant).
- GRS (Geo-redundant).
- RA-GRS (Geo + Read access).
-
Security:
-
RBAC + IAM.
- SAS tokens (time-limited access).
- Integration with Azure AD (Managed Identities).
-
Lifecycle Management:
-
Move data between hot → cool → archive tiers automatically.
-
Encryption:
-
All data encrypted at rest (Microsoft-managed or customer-managed keys).
🔹 Example: Data Engineer Workflow¶
Let’s say you’re building a pipeline with Databricks + Delta Lake:
- Create a Storage Account with HNS enabled (for ADLS Gen2).
- Create a container (
bronze
,silver
,gold
). - Upload raw CSVs to
bronze
. - Mount storage in Databricks (via Managed Identity + UC external location).
- Write cleaned Delta tables to
silver
. - Curated aggregates go to
gold
.
👉 This setup is the lakehouse pattern.
🔹 Quick Analogy¶
- Storage Account = the house.
- Container (Blob) = big room in the house.
- Folder = cabinet inside the room.
- File (Blob object) = actual book/document.
- RBAC/SAS = the keys to the house/room.