Virtualization and cloud technologies

Jakub Klinkovský

:: Czech Technical University in Prague
:: Faculty of Nuclear Sciences and Physical Engineering
:: Department of Software Engineering

Academic year 2024-2025

What is a "cloud"?

What is the cloud?

What is the cloud?

Let's try an abstract definition:

In the context of computing, cloud is a specific network of resources that can be accessed over the internet and used to store, manage, and process data.

Cloud is about technologies, but in the modern sense, the term is closely related to various business models. Note that the users of cloud may not be individual people, but companies.

Origins of the word

  • The ☁️ symbol was often used in diagrams and flowchars to depict a network or the Internet. It symbolizes a vast, amorphous, and interconnected system that can be accessed from anywhere.
  • The term cloud computing as we know it today started to gain prominence around 2006, when Amazon launched the Amazon Web Services (AWS), which included the Elastic Compute Cloud (EC2). This was one of the earliest known uses of the term in a commercial context.
  • Since then, the term cloud has become widely adopted and is now a fundamental concept in the technology industry, representing a wide range of services.
  • Note that any technology usually becomes wide-spread only several years after its invention.

Example: what is needed to deploy a website

Let's say you have some money and want to make more money.
You have a product – code of a super cool new website, text content, multimedia, etc.
What do you need to deliver it to your users and how?

  • hardware – data processing resources and storage capacity
  • internet connectivity – public IP address and domain
  • software environment – operating system, database system, etc.
  • security – authentication system, payment gate, etc.
  • applications – components of your website
  • content – text and multimedia that users want

Cloud service models

Comparison of on-premise and typical cloud service models (NIST 2011):

center

Details of cloud service models

  • On-premise: you (or your company) has to manage everything
  • Infrastructure as a service (IaaS):
    • hardware, electricity, storage, and networking is managed by a provider
    • the customer (you or your company) manages the full software stack (including the operating system)
  • Platform as a service (PaaS):
    • the provider manages everything needed to run a specific type of applications
      (i.e., the platform)
    • for example, Google App Engine supports applications written in Go, PHP, Java, Python, Node.js, .NET, and Ruby
  • Software as a service (SaaS): the customer cannot run arbitrary software, but is responsible only for the management of content and users

Types of cloud computing

There are four types of cloud computing:

  1. Public cloud – infrastructure is not owned by the end user, but provided by a public provider (the largest ones are AWS, Google Cloud, IBM Cloud, and Microsoft Azure)
  2. Private cloud – cloud environments solely dedicated to a single customer, typically runs behind that customer's firewall.
  3. Hybrid cloud – a seemingly single environment created from multiple environments connected through various networks (LAN, WAN, VPN) and/or APIs.
    • E.g. private + public, private + private, public + public
  4. Multicloud – the use of multiple separate cloud environments, e.g. to improve security and/or performance.

Essential cloud technologies

Every cloud is built using two fundamental technologies:

  • Internet
  • Virtualization

Moreover, cloud allows to create more modern technologies, particularly software:

Virtualization

Virtualization is the process of simulating some real effect, condition, or object with some imitated alternative.

Virtualization in computing

In the context of computing, virtualization means creating a virtual version of common computing hardware at the same abstraction level.

center

Virtual machine components

  • Virtual CPU: an imitation of a physical CPU, allowing multiple virtual machines to share the same physical CPU resources.
    • The number of cores and speed of the virtual CPU can be configured.
  • Virtual RAM: an allocation of physical RAM that is dedicated to a virtual machine.
    • The size of the virtual RAM can be configured.
  • Virtual disk: a file or set of files that represents a physical disk drive to the guest operating system.
    • The size, file format and storage location in the host OS can be configured.
  • Virtual network interface: a software emulation of a physical network interface card (NIC).
    • The interface type, communication protocols and topology can be configured.

Why virtualization?

  • Division of real resources – increased flexibility and efficiency
  • Isolation of virtual resources – increased security

Hardware virtualization

Hardware virtualization means creating virtualized hardware environment for running multiple virtual machines (VMs) on a single physical machine.

There are two approaches: emulation and hardware-assisted virtualization:

  • Emulation: Software-only approach, where the virtualization software (hypervisor) emulates the hardware environment for each VM, translating guest OS instructions into host OS instructions.
    • Any hardware can be emulated (e.g. old or exotic platforms), but it is slow.
  • Hardware-assisted virtualization: The hypervisor can use specialized hardware features, such as processor instructions (Intel VT-x or AMD-V), to assist the virtualization process, providing a more efficient and secure way to run VMs.
    • The virtual hardware has the same architecture as the physical hardware.

Hardware virtualization levels

The following modes are used to categorize virtualization technologies based on their level of hardware virtualization and guest OS modification:

  • Full virtualization
  • Para-virtualization
  • Hybrid virtualization

Note: These modes are not mutually exclusive, in practice many solutions are hybrid to achieve optimal performance and compatibility.

Full virtualization

  • The hypervisor provides a complete, virtualized hardware environment for each guest OS
  • Guest OS is not aware that it is running on a virtual machine and requires no modifications
  • Examples: VMware, VirtualBox, KVM

Para-virtualization

  • The guest OS is modified to be aware of the virtualized environment and communicates directly with the hypervisor
  • Guest OS requires modifications to run on the virtual machine
  • Examples: Xen (with modified Linux and Windows guests)

Hybrid virtualization

  • Combines elements of full virtualization and para-virtualization
  • The hypervisor provides a virtualized hardware environment, but the guest OS can also communicate with the hypervisor for improved performance
  • Examples: Xen (with unmodified Windows guests, using a special driver), KVM with para-virtualized drivers (e.g. graphics, block devices, or network)

Hardware virtualization types

Virtualization can be categorized into two main types based on the location and architecture of the hypervisor:

center

Characteristics

  • Type-1 Virtualization (Bare-Metal):
    • Hypervisor runs directly on the host machine's hardware
    • No underlying operating system
    • Direct access to hardware resources
    • Examples: VMware ESXi, Microsoft Hyper-V, KVM
  • Type-2 Virtualization (Hosted):
    • Hypervisor runs on top of an existing host operating system
    • Host OS manages hardware resources
    • Hypervisor runs as an application on top of the host OS
    • Examples: VMware Workstation, VirtualBox, Parallels Desktop

Key differences

  • Performance: Type-1 generally offers better performance due to direct access to the underlying hardware
  • Complexity: Type-2 is often easier to install and manage, as it runs on top of an existing OS
  • Security: Type-1 can provide better security, as the hypervisor has direct control over hardware resources

Note: The choice of virtualization type depends on the specific use case, such as server virtualization, desktop virtualization, or development environments.

Note: emulation vs hardware-assisted virtualization, full vs para-virtualization, and Type-1 vs Type-2, are three independent concepts.

Ask your AI: Is Xen Type-1 or Type-2?

Xen is a bit of a special case. Xen can be both Type-1 and Type-2, depending on the configuration:

  • Xen (Type-1): When Xen is installed directly on the bare metal, without an underlying operating system, it acts as a Type-1 hypervisor. This is the most common configuration for Xen.
  • Xen (Type-2): However, Xen can also be installed on top of a Linux operating system, such as Ubuntu or CentOS, using a package called "Xen Tools" or "Xen Server". In this case, Xen runs as a Type-2 hypervisor, on top of the host Linux OS.

So, Xen is primarily a Type-1 hypervisor, but it can also be used as a Type-2 hypervisor in certain configurations.

Ask your AI: Is KVM Type-1 or Type-2?

KVM (Kernel-based Virtual Machine) is a bit of a special case as well, because it runs as a module within the Linux kernel.

  • Arguments for KVM being Type-1:
    • Runs directly on the host machine's hardware
    • Linux kernel serves as a platform for KVM, rather than a traditional host OS
  • Arguments for KVM being Type-2:
    • Relies on the Linux kernel to manage hardware resources
    • Linux kernel acts as a layer between KVM and the hardware, providing its device drivers and interfaces to interact with the hardware

The distinction between Type-1 and Type-2 hypervisors can be blurry, and different interpretations are possible depending on the context and criteria used.

Limitations of hardware virtualization

While virtualization offers many benefits, it also has some limitations:

  • Performance overhead due to emulation and/or context switching
  • Complexity in managing and configuring virtualized environments
    (need to manage complete operating systems)
  • Limited support for certain hardware devices or legacy systems

These limitations have led to the development of alternative technologies that are lightweight and efficient, but limited in terms of isolation and security. They are known as operating system virtualization or containerization.

Operating system virtualization

Operating system virtualization or containerization is a lightweight and efficient way to deploy and manage application-level software:

  • Containers are isolated environments that run on a single host operating system.
  • All containers share the same kernel with the host operating system, but have their own user space (application-level software).
  • Containers are portable and can be easily moved between hosts, without requiring a specific environment or dependencies.

Traditionally operating systems have a fixed combination of kernel and user space. Containers change the user space into a swappable component. This means that the entire user space portion of an operating system, including programs and custom configurations, can be independent of the host operating system.

Hardware virtualization vs OS-virtualization

Hardware virtualization Operating system virtualization
Virtualization level
(resource allocation)
Allocates hardware resources (CPU, RAM, disk, etc.) to virtual machines Allocates OS resources (processes, memory, file systems, etc.) to isolated environments
Guest OS Can run multiple, independent operating systems Runs isolated environments in a single operating system with a shared kernel and hardware
Performance Higher overhead due to separate operating systems Lower overhead (no emulation, shared operating system)
Security Provides strong isolation and security between virtual machines Provides process-level isolation, but may not be as secure as hardware virtualization
Examples VMware, KVM, Xen, etc. Docker, Linux Containers, OpenVZ, etc.

Containerization architecture

Containerization architecture

Example: Web application deployment

Consider a web application that requires:

  • A web server (e.g. Apache or Nginx)
  • A database (e.g. MySQL or PostgreSQL)
  • A caching layer (e.g. Redis or Memcached)

Native approach

  • Install and run all components (web server, database, caching layer, and the web application itself) directly on the host operating system
  • Each component would run as a separate process, sharing the same kernel and system resources
  • What are the advantages and disadvantages?
    • No overhead from extra management of resources
    • Limited isolation and security between applications
    • Potential for resource conflicts and dependencies between applications
    • Complex installation, configuration and management

Virtualization approach

  • Create 3 separate virtual machines, each with its own operating system and application stack
  • Each VM would require its own allocation of resources (CPU, RAM, disk space)
  • It would result in significant overhead and resource waste, especially if each VM is not fully utilized

Containerization approach

  • Create 3 separate containers, each running a single application (web server, database, caching layer)
  • All containers share the same hardware resources and host operating system
    • no need for exclusive allocation of resources
    • all containers can use up to e.g. 90% of CPUs and memory when available
  • Containers can be restarted quickly
  • Results in significant reduction in overhead and resource waste, with improved efficiency and scalability

Comparison

In this example, containerization is a better choice than virtualization because:

  • The components are relatively lightweight and do not require full virtualization
  • The components can share the same kernel with the host operating system, reducing overhead and improving efficiency
  • Containerization provides the necessary isolation and flexibility for each component, without the overhead of virtualization

What is inside containers?

In principle, containers can contain anything, even the whole operating system.

However, containers can be designed to contain only files required to run just one application or even just one component of an application.

  • System containers – based on a snapshot of a whole operating system, similar to virtual machines
    Example technologies: systemd-nspawn, LXC, OpenVZ
  • Application containers – based on a layered image providing only a specific application and its dependencies
    Example technologies: Docker, Podman, Apptainer, Kubernetes

Use cases for system containers

  • Development environments: creating isolated development environments, allowing developers to work on different projects without affecting each other.
  • Legacy system support: system containers can be used to run legacy systems or applications that require a specific version of an operating system.
  • Exploring or testing various Linux distributions or various versions of some software
  • Running an application in a secure, isolated environment, when the benefits of application containers are not relevant

Use cases for application containers

Application containers are typically single-purpose environments, typically used for:

  • Web application deployment: Application containers provide a lightweight and efficient way to deploy web applications and services.
  • Microservices architecture: Application containers enable the deployment of multiple, isolated services that communicate with each other via some API.
  • DevOps and CI/CD: Application containers facilitate agile development, testing, and deployment of applications.
  • Cloud-native applications: Application containers are used to deploy cloud-native applications, which are designed to take advantage of cloud computing principles, such as scalability and on-demand resources.

Reproducibility with containers

Developers define the intended environment in a special definition file and when satisfied, they can push the results to a container registry:

Users can pull the image from the registry, run just 1 line of code, even years later, and get exactly the same working environment.

Limitations of containerization

While containerization offers many benefits, it also has some limitations:

  • Security: containers share the same kernel as the host operating system, which can pose security risks if not properly managed
  • Isolation: containers may not provide the same level of isolation as traditional virtual machines
  • Resource management: containers require careful resource management to ensure efficient utilization and prevent resource starvation

Open Container Initiative (OCI)

The Open Container Initiative (OCI) is a Linux Foundation project that aims to create a set of common, open standards for container formats and runtimes.

  • Standardize container formats: Ensure portability across various platforms and environments
  • Ensure interoperability: Allow containers to be used with different runtimes and tools
  • Promote innovation: Encourage development of new container-related technologies and tools
  • Specifications: OCI Image Format, OCI Runtime, and OCI Registry
  • Adoption: Widespread adoption by companies and organizations, growing ecosystem with new tools and technologies being developed

Architecture of application containers

The containerization architecture consists of:

  • Container runtime: the software that manages the creation, execution, and termination of containers
    • Examples: runc, crun (native), gVisor (sandbox), Kata Containers (VMs)
  • Networking backend: the software responsible for creating network interfaces, configuring IP addresses, and providing network isolation
  • Container engine: the software that enables the management of containers, images, volumes, networks
    • Examples: Docker, Podman
  • Image registry: a repository that stores and manages container images
    • Examples: Docker Hub, GitHub Container Registry, Amazon ECR

Essential characteristics of cloud services

  • Broad network access: accessible over the internet (or a private network), from anywhere and any device
  • On-demand self-service: automated provisioning and management, without the need for human intervention on the side of the provider.
  • Scalability: Cloud services are designed to quickly scale up or down to match changing business needs, without the need for significant upfront investment.
  • High availability: Cloud services are designed to provide high levels of uptime and availability, with built-in redundancy and failover capabilities.

Orchestration and Infrastructure-as-Code

Orchestration and Infrastructure-as-Code (IaC) are two related concepts that help organizations manage and automate their cloud infrastructure.

  • Orchestration: Automating the deployment, scaling, and management of cloud resources, such as virtual machines or containers.
    Example tools: OpenStack (IaaS, virtual machines), Kubernetes (PaaS/SaaS, containers)
  • Infrastructure-as-Code (IaC): Managing and provisioning cloud infrastructure through code rather than manual configuration, using tools such as Ansible, Terraform, CloudFormation, or Azure Resource Manager.
  • Benefits: Improved efficiency, optimized cost, reduced errors, improved reliability.

Orchestration examples

center

Orchestration examples

center

Orchestration examples

center

Orchestration examples

center

Orchestration examples

There are hundreds of cloud-native applications, but most container orchestration tools are modifications of Kubernetes.

See CNCF cloud native landscape.

Operating systems

Before we delve into the details of containerization, we need to talk about operating systems...

  • What are the main functions of an operating system?
  • What is your favorite operating system?
  • Which other operating systems do you know or use?

Desktop/laptop operating systems

center

Web server operating systems

center

See also Wikipedia: Usage share of operating systems

Summary

Let's recap the most important concepts from the presentation:

  • What are IaaS, PaaS, SaaS?
  • What is the difference between cloud and virtualization?
  • What is the difference between virtualization and containerization?
  • What are kernel space and user space in the context of Linux?
  • See Cloud Native Glossary for details.

Instead of using local servers or personal computing resources, cloud computing allows users and organizations to leverage a vast network of servers and services provided by third-party companies. These services can include storage, databases, software, and computing power, which can be scaled up or down as needed, often on a pay-as-you-go basis. This model offers flexibility, cost savings, and the ability to quickly adapt to changing business needs.

The term "cloud" in the context of computing is derived from the way networks and the internet were often depicted in diagrams and flowcharts. In these diagrams, the internet or a network was represented by a cloud-shaped symbol, symbolizing a vast, amorphous, and interconnected system that could be accessed from various points. The term "cloud computing" itself began to gain prominence in the early 2000s, but the concept has roots that go back to the 1960s. The idea of delivering computing resources over a network was first proposed by J.C.R. Licklider in the 1960s, who was one of the pioneers of the ARPANET, the precursor to the modern internet. Licklider's vision was for everyone to be interconnected and accessing programs and data at any site, from anywhere. However, the term "cloud computing" as we know it today started to be used more frequently in the early 2000s. One of the earliest known uses of the term in a commercial context was by Amazon in 2006, when they launched Amazon Web Services (AWS), which included the Elastic Compute Cloud (EC2). This service allowed users to rent virtual computers on which to run their own applications, marking a significant milestone in the development and popularization of cloud computing. Since then, the term "cloud" has become widely adopted and is now a fundamental concept in the technology industry, representing a wide range of services and solutions that leverage the internet to provide scalable and flexible computing resources.

https://www.redhat.com/en/topics/cloud-computing/what-are-cloud-services

Note: Operating System Virtualization is also known as OS-level virtualization or containerization. In summary, hardware virtualization provides a complete, virtualized hardware environment for each guest operating system, while operating system virtualization provides a lightweight, isolated environment for applications and services within a single operating system. The choice between hardware virtualization and operating system virtualization depends on the specific use case, performance requirements, and security needs.

Members: - Founding members: Docker, CoreOS, Google, Microsoft, and Amazon. - Other members: Over 50 companies and organizations, including Red Hat, IBM, and VMware. The Open Container Initiative is an important step towards creating a standardized and interoperable container ecosystem, and its specifications and tools are widely adopted in the industry.

cloud is a methodology, but virtualization is a technology

# Advantages and criticism of cloud computing Con: common people have lost the notion of data ownership Con: danger of vendor lock-in (especially for end-users of various content delivery platforms) TODO: web 3.0? (if relevant) https://en.wikipedia.org/wiki/History_of_the_World_Wide_Web#Web_3.0_and_Web3 ---