Cloud Storage
Setupβ
We will be using a virtual machine in the faculty's cloud.
When creating a virtual machine in the Launch Instance window:
- Name your VM using the following convention:
cc_lab<no>_<username>, where<no>is the lab number and<username>is your institutional account. - Select Boot from image in Instance Boot Source section
- Select CC Template in Image Name section
- Select the g.medium flavor.
In the base virtual machine:
- Download the laboratory archive from here.
Use:
wget https://repository.grid.pub.ro/cs/cc/laboratoare/lab-storage.zipto download the archive. - Extract the archive.
student@lab-storage:~$ # download the archive
student@lab-storage:~$ wget https://repository.grid.pub.ro/cs/cc/laboratoare/lab-storage.zip
student@lab-storage:~$ unzip lab-storage.zip
Creating a Kubernetes clusterβ
As in the previous laboratories, we will create a cluster on the lab machine, using the kind create cluster command:
student@lab-storage:~$ kind create cluster --config kind-config.yaml
Creating cluster "cc-storage" ...
β Ensuring node image (kindest/node:v1.34.0) πΌ
β Preparing nodes π¦ π¦
β Writing configuration π
β Starting control-plane πΉοΈ
β Installing CNI π
β Installing StorageClass πΎ
β Joining worker nodes π
Set kubectl context to "kind-cc-storage"
You can now use your cluster with:
kubectl cluster-info --context kind-cc-storage
Have a nice day! π
It is recommended that you use port-forwarding instead of X11 forwarding to interact with the UI.
Storage in Cloudβ
Storage is a critical part of any cloud application. This data can be anything from user-generated content, application logs, backups, or even machine learning models. Because an application is running in the cloud, it needs a way to access storage that is not tied to a specific machine or location. This is where cloud storage comes in.
Requirements for cloud storage include:
- Accessibility: Data should be easily accessible from anywhere, through APIs or other interfaces.
- Performance: Cloud storage should provide low latency and high throughput for data access.
- Scalability: The ability to handle increasing amounts of data without performance degradation.
- Durability: Ensuring that data is not lost and can be retrieved reliably (e.g., through replication).
On-Premises vs Cloud Storageβ
The need for a storage solution for a cloud application is obvious but leaves the question of why not deploying it on-premises.
On-premises storage refers to storage solutions that are physically located within an organization's premises, such as local hard drives or network-attached storage (NAS). In contrast, cloud storage is provided by third-party providers and accessed over the internet.
| On-Premises Storage | Cloud Storage | |
|---|---|---|
| Cost | High upfront costs, ongoing maintenance | Pay-as-you-go for storage and usage |
| Performance | Limited by local hardware and network | High performance with optimized infrastructure |
| Scalability | Limited, requires manual intervention | can grow with demand |
| Durability | Prone to failure, requires backups | High durability, often with replication |
As a rough baseline, standard object storage costs approximately $0.02β0.025 per GB/month across providers (AWS, Azure, GCP), making them broadly comparable for storage alone. The real cost differences emerge from read/write operations and how tightly a workload is coupled to provider-specific features.
Providersβ
- AWS S3 - The most widely adopted object storage service, with the richest ecosystem of integrations and tooling
- GCP Cloud Storage - Tight integration with Google's data and ML services (BigQuery, Dataflow, Vertex AI)
- Azure Blob Storage - Best fit for organizations already in the Microsoft ecosystem (Active Directory, Office 365)
Storage in Kubernetesβ
Kubernetes provides integration with various storage backends, abstracting them with the following concepts:
- Persistent Volume Claim (PVC): a request for storage by a user. This request is fulfilled by finding a suitable Persistent Volume and binding it to the claim.
- Persistent Volume (PV): a piece of storage in the cluster, that can be mounted by pods. It can be provisioned by an administrator manually or dynamically, using a Storage Class.
- Storage Class: it is configured to create Persistent Volumes on demand. It defines the provisioner (e.g., AWS EBS, GCE PD, Azure Disk) and parameters (e.g., type of disk, IOPS) for the PVs it creates.
This abstraction enables applications to use storage as they would with a local disk, while Kubernetes manages the underlying storage resources and their lifecycle. Changing the storage backend (e.g., switching from AWS EBS to Azure Disk) does not require changes to the application code, as long as the PVCs and PVs are properly configured.
Persistent Volume Claim (PVC)β
The Persistent Volume Claim is a request for storage by a user. It will be fulfilled by Kubernetes and bound to a suitable Persistent Volume. A typical PVC definition looks like this:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-pvc
spec:
# How the volume will be mounted by the pod. Available options are:
# - ReadWriteOnce: the volume can be mounted as read-write by a single node
# - ReadOnlyMany: the volume can be mounted as read-only by many nodes
# - ReadWriteMany: the volume can be mounted as read-write by many nodes
accessModes:
- ReadWriteOnce
# The minimum amount of storage that the volume should have.
resources:
requests:
storage: 8Gi
# The policy for reclaiming the volume when it is released. Available options are:
# - Retain: the volume will be retained when the claim is deleted
# - Delete: the volume will be deleted when the claim is deleted
persistentVolumeReclaimPolicy: Retain
# Optional: the name of the Storage Class to use for dynamic provisioning.
storageClassName: nvme-ssd
# Alternatively, you can specify a specific PV to bind to by using the `volumeName` field.
# This will block the claim until the specified PV is available and matches the claim's requirements.
# volumeName: my-pv
The accessModes field in the PVC and PV definitions refers to nodes, not pods. This means that if a PV is created with ReadWriteOnce, it can only be mounted by one node at a time, but multiple pods on that node can access it simultaneously.
To use a PVC you have to mount it in a pod. This is done in two steps:
- First, you specify the PVC in the
volumessection of the pod spec, this makes the volume available to the containers in the pod. - Then, you specify the
volumeMountsin the container spec to mount the volume to a specific path inside the container.
A typical pod definition that uses a PVC looks like this:
apiVersion: v1
kind: Pod
metadata:
name: my-pod
spec:
# The list of volumes that can be mounted by containers in this pod. Each volume must have a unique name.
volumes:
- name: my-volume
# The source of the volume. In this case we are using a PVC, but there are other options like ConfigMap, Secret, etc.
persistentVolumeClaim:
claimName: my-pvc
containers:
- name: my-container
image: nginx
# The list of volumes mounted into the container. Each volumeMount must reference a volume defined in the .spec.volumes.
volumeMounts:
- name: my-volume
mountPath: /usr/share/nginx/html
Persistent Volume (PV)β
A Persistent Volume is an extension of the concept of a Volume in Docker. Both are used to persist data beyond the lifecycle of a container. In addition:
- the Persistent Volume is not tied to a specific node, meaning that a pod can be rescheduled to another node without losing data
- the Persistent Volume is not tied to a specific pod, meaning that multiple pods can mount it to share data
The Persistent Volume is a piece of storage in the cluster, that can be mounted by pods. Unless there is a specific need to create PVs manually, it is recommended to use dynamic provisioning with Storage Classes, which simplifies the management of storage resources. A typical PV definition looks like this:
apiVersion: v1
kind: PersistentVolume
metadata:
name: my-pv
spec:
# How the volume will be mounted by the pod. Available options are:
# - ReadWriteOnce: the volume can be mounted as read-write by a single node
# - ReadOnlyMany: the volume can be mounted as read-only by many nodes
# - ReadWriteMany: the volume can be mounted as read-write by many nodes
accessModes:
- ReadWriteOnce
# The capacity of the volume. This is the total amount of storage that the PV provides.
capacity:
storage: 8Gi
# The policy for reclaiming the volume when it is released. Available options are:
# - Retain: the volume will be retained when the claim is deleted
# - Delete: the volume will be deleted when the claim is deleted
persistentVolumeReclaimPolicy: Retain
Exercise: Manual Provisioningβ
Storage Classes are the recommended way to manage storage in Kubernetes, but it is also possible to create Persistent Volumes manually. This exercise will help you understand how manual provisioning works and how to troubleshoot common issues.
An app was deployed but its pod is stuck in Pending. Figure out what is missing and fix it.
-
Run the setup script to create the broken resources:
student@lab-storage:~$ bash setup-manual-pvc.sh -
Investigate the status of the pod and the PVC:
student@lab-storage:~$ kubectl describe pod manual-pv-pod
student@lab-storage:~$ kubectl describe pvc manual-pvc -
Create the missing resource so the pod reaches
Running.tipWhen creating the Persistent Volume you have to setup its storage backend. For this exercise you can use
.spec.hostPath: /tmp/manual-pv-datafield, which will link the PV to a directory on the node.This is not recommended for production use, but it is useful for learning purposes.
Storage Classβ
A Storage Class is a way to define how storage is provisioned in the cluster. A cluster might have multiple Storage Classes, each representing a different type of storage (e.g., SSD, HDD, network storage) with different performance characteristics and costs. A typical Storage Class definition looks like this:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: nvme-ssd
# The provisioner that will create the underlying storage resource.
# Each cloud provider has its own provisioner (e.g., AWS EBS, GCE PD, Azure Disk).
# Use "rancher.io/local-path" for local storage.
provisioner: rancher.io/local-path
# The policy for reclaiming the volume when the PVC is deleted. Available options are:
# - Retain: the volume will be retained when the claim is deleted
# - Delete: the volume will be deleted when the claim is deleted
reclaimPolicy: Delete
# Allow PVCs to expand the volume after creation.
allowVolumeExpansion: true
# When to bind a Persistent Volume to a Persistent Volume Claim. Available options are:
# - Immediate: the PV will be bound to the PVC as soon as it is created
# - WaitForFirstConsumer: the PV will be bound to the PVC only when a Pod that uses the PVC is scheduled.
volumeBindingMode: WaitForFirstConsumer