Saltar a contenido

Overview of Azure Storage


🔹 What is Azure Storage?

Azure Storage is Microsoft’s cloud-based storage platform, providing highly available, durable, and secure storage for blobs, files, queues, and tables.

Think of it like a giant hard drive in the cloud, but with specialized “drawers” for different types of data.


🔹 Core Types of Azure Storage

  1. Blob Storage (Data Lake Gen2)

  2. Stores unstructured data: text, images, video, JSON, Parquet, CSV.

  3. Supports HDFS-like hierarchical namespace (HNS) when enabled → required for Data Lake Gen2.
  4. Used for:

    • Data lakes (ETL, big data, analytics).
    • Storing files for machine learning.
    • Backups, archives.
    • Example: abfss://container@account.dfs.core.windows.net/

  1. File Storage (Azure Files)

  2. Fully managed file shares accessible via SMB/NFS.

  3. Used when apps expect a network file share.
  4. Example use cases:

    • Lift-and-shift legacy apps that require shared drives.
    • Store config files for apps running in Azure VMs or Kubernetes.

  1. Queue Storage

  2. Stores messages that applications can send and receive asynchronously.

  3. Each message up to 64 KB.
  4. Used for decoupling applications (producer/consumer).
  5. Example: A web app puts a message in a queue, and a background worker picks it up for processing.

  1. Table Storage (or Cosmos DB Table API)

  2. NoSQL key-value store.

  3. Stores structured, non-relational data.
  4. Example: IoT telemetry, user profiles, metadata.

🔹 Storage Account Types

When you create a Storage Account in Azure, you’re creating a top-level container for these services.

Types:

  1. General-purpose v2 (GPv2) → recommended, supports blobs, files, queues, tables, Data Lake Gen2.
  2. Blob storage account → specialized for blobs.
  3. Premium storage → optimized for low-latency scenarios (SSD-backed).

🔹 Key Features

  • Redundancy (Replication):

  • LRS (Locally redundant).

  • ZRS (Zone redundant).
  • GRS (Geo-redundant).
  • RA-GRS (Geo + Read access).
  • Security:

  • RBAC + IAM.

  • SAS tokens (time-limited access).
  • Integration with Azure AD (Managed Identities).
  • Lifecycle Management:

  • Move data between hot → cool → archive tiers automatically.

  • Encryption:

  • All data encrypted at rest (Microsoft-managed or customer-managed keys).


🔹 Example: Data Engineer Workflow

Let’s say you’re building a pipeline with Databricks + Delta Lake:

  1. Create a Storage Account with HNS enabled (for ADLS Gen2).
  2. Create a container (bronze, silver, gold).
  3. Upload raw CSVs to bronze.
  4. Mount storage in Databricks (via Managed Identity + UC external location).
  5. Write cleaned Delta tables to silver.
  6. Curated aggregates go to gold.

👉 This setup is the lakehouse pattern.


🔹 Quick Analogy

  • Storage Account = the house.
  • Container (Blob) = big room in the house.
  • Folder = cabinet inside the room.
  • File (Blob object) = actual book/document.
  • RBAC/SAS = the keys to the house/room.