Previous slide Next slide Toggle fullscreen Open presenter view
Virtualization and cloud technologies
Jakub Klinkovský
:: Czech Technical University in Prague
:: Faculty of Nuclear Sciences and Physical Engineering
:: Department of Software Engineering
What is the cloud?
Let's try an abstract definition:
In the context of computing, cloud is a specific network of resources that can be accessed over the internet and used to store, manage, and process data.
Cloud is about technologies , but in the modern sense, the term is closely related to various business models . Note that the users of cloud may not be individual people, but companies.
Origins of the word
The symbol was often used in diagrams and flowchars to depict a network or the Internet . It symbolizes a vast, amorphous, and interconnected system that can be accessed from anywhere.
The term cloud computing as we know it today started to gain prominence around 2006, when Amazon launched the Amazon Web Services (AWS) , which included the Elastic Compute Cloud (EC2) . This was one of the earliest known uses of the term in a commercial context.
Since then, the term cloud has become widely adopted and is now a fundamental concept in the technology industry, representing a wide range of services.
Note that any technology usually becomes wide-spread only several years after its invention.
Example: what is needed to deploy a website
Let's say you have some money and want to make more money.
You have a product – code of a super cool new website, text content, multimedia, etc.
What do you need to deliver it to your users and how?
hardware – data processing resources and storage capacity
internet connectivity – public IP address and domain
software environment – operating system, database system, etc.
security – authentication system, payment gate, etc.
applications – components of your website
content – text and multimedia that users want
Cloud service models
Comparison of on-premise and typical cloud service models (NIST 2011 ):
Details of cloud service models
On-premise : you (or your company) has to manage everything
Infrastructure as a service (IaaS) :
hardware, electricity, storage, and networking is managed by a provider
the customer (you or your company) manages the full software stack (including the operating system)
Platform as a service (PaaS) :
the provider manages everything needed to run a specific type of applications
(i.e., the platform )
for example, Google App Engine supports applications written in Go, PHP, Java, Python, Node.js, .NET, and Ruby
Software as a service (SaaS) : the customer cannot run arbitrary software, but is responsible only for the management of content and users
Types of cloud computing
There are four types of cloud computing :
Public cloud – infrastructure is not owned by the end user, but provided by a public provider (the largest ones are AWS, Google Cloud, IBM Cloud, and Microsoft Azure)
Private cloud – cloud environments solely dedicated to a single customer , typically runs behind that customer's firewall.
Hybrid cloud – a seemingly single environment created from multiple environments connected through various networks (LAN, WAN, VPN) and/or APIs.
E.g. private + public, private + private, public + public
Multicloud – the use of multiple separate cloud environments , e.g. to improve security and/or performance.
Essential cloud technologies
Every cloud is built using two fundamental technologies:
Moreover, cloud allows to create more modern technologies, particularly software:
Virtualization
Virtualization is the process of simulating some real effect, condition, or object with some imitated alternative.
Virtualization in computing
In the context of computing, virtualization means creating a virtual version of common computing hardware at the same abstraction level.
Virtual machine components
Virtual CPU : an imitation of a physical CPU, allowing multiple virtual machines to share the same physical CPU resources.
The number of cores and speed of the virtual CPU can be configured.
Virtual RAM : an allocation of physical RAM that is dedicated to a virtual machine.
The size of the virtual RAM can be configured.
Virtual disk : a file or set of files that represents a physical disk drive to the guest operating system.
The size, file format and storage location in the host OS can be configured.
Virtual network interface : a software emulation of a physical network interface card (NIC).
The interface type, communication protocols and topology can be configured.
Why virtualization?
Division of real resources – increased flexibility and efficiency
Isolation of virtual resources – increased security
Hardware virtualization
Hardware virtualization means creating virtualized hardware environment for running multiple virtual machines (VMs) on a single physical machine.
There are two approaches: emulation and hardware-assisted virtualization :
Emulation : Software-only approach, where the virtualization software (hypervisor) emulates the hardware environment for each VM, translating guest OS instructions into host OS instructions.
Any hardware can be emulated (e.g. old or exotic platforms), but it is slow.
Hardware-assisted virtualization : The hypervisor can use specialized hardware features, such as processor instructions (Intel VT-x or AMD-V), to assist the virtualization process, providing a more efficient and secure way to run VMs.
The virtual hardware has the same architecture as the physical hardware.
Hardware virtualization levels
The following modes are used to categorize virtualization technologies based on their level of hardware virtualization and guest OS modification:
Full virtualization
Para-virtualization
Hybrid virtualization
Note: These modes are not mutually exclusive, in practice many solutions are hybrid to achieve optimal performance and compatibility.
Full virtualization
The hypervisor provides a complete, virtualized hardware environment for each guest OS
Guest OS is not aware that it is running on a virtual machine and requires no modifications
Examples: VMware , VirtualBox , KVM
Para-virtualization
The guest OS is modified to be aware of the virtualized environment and communicates directly with the hypervisor
Guest OS requires modifications to run on the virtual machine
Examples: Xen (with modified Linux and Windows guests)
Hybrid virtualization
Combines elements of full virtualization and para-virtualization
The hypervisor provides a virtualized hardware environment, but the guest OS can also communicate with the hypervisor for improved performance
Examples: Xen (with unmodified Windows guests, using a special driver), KVM with para-virtualized drivers (e.g. graphics, block devices, or network)
Hardware virtualization types
Virtualization can be categorized into two main types based on the location and architecture of the hypervisor:
Characteristics
Type-1 Virtualization (Bare-Metal) :
Hypervisor runs directly on the host machine's hardware
No underlying operating system
Direct access to hardware resources
Examples: VMware ESXi, Microsoft Hyper-V, KVM
Type-2 Virtualization (Hosted) :
Hypervisor runs on top of an existing host operating system
Host OS manages hardware resources
Hypervisor runs as an application on top of the host OS
Examples: VMware Workstation, VirtualBox, Parallels Desktop
Key differences
Performance: Type-1 generally offers better performance due to direct access to the underlying hardware
Complexity: Type-2 is often easier to install and manage, as it runs on top of an existing OS
Security: Type-1 can provide better security, as the hypervisor has direct control over hardware resources
Note: The choice of virtualization type depends on the specific use case, such as server virtualization, desktop virtualization, or development environments.
Note: emulation vs hardware-assisted virtualization, full vs para-virtualization, and Type-1 vs Type-2, are three independent concepts.
Ask your AI: Is Xen Type-1 or Type-2?
Xen is a bit of a special case. Xen can be both Type-1 and Type-2, depending on the configuration:
Xen (Type-1) : When Xen is installed directly on the bare metal, without an underlying operating system, it acts as a Type-1 hypervisor. This is the most common configuration for Xen.
Xen (Type-2) : However, Xen can also be installed on top of a Linux operating system, such as Ubuntu or CentOS, using a package called "Xen Tools" or "Xen Server". In this case, Xen runs as a Type-2 hypervisor, on top of the host Linux OS.
So, Xen is primarily a Type-1 hypervisor, but it can also be used as a Type-2 hypervisor in certain configurations.
Ask your AI: Is KVM Type-1 or Type-2?
KVM (Kernel-based Virtual Machine) is a bit of a special case as well, because it runs as a module within the Linux kernel.
Arguments for KVM being Type-1:
Runs directly on the host machine's hardware
Linux kernel serves as a platform for KVM, rather than a traditional host OS
Arguments for KVM being Type-2:
Relies on the Linux kernel to manage hardware resources
Linux kernel acts as a layer between KVM and the hardware, providing its device drivers and interfaces to interact with the hardware
The distinction between Type-1 and Type-2 hypervisors can be blurry, and different interpretations are possible depending on the context and criteria used.
Limitations of hardware virtualization
While virtualization offers many benefits, it also has some limitations:
Performance overhead due to emulation and/or context switching
Complexity in managing and configuring virtualized environments
(need to manage complete operating systems)
Limited support for certain hardware devices or legacy systems
These limitations have led to the development of alternative technologies that are lightweight and efficient, but limited in terms of isolation and security. They are known as operating system virtualization or containerization .
Operating system virtualization
Operating system virtualization or containerization is a lightweight and efficient way to deploy and manage application-level software:
Containers are isolated environments that run on a single host operating system.
All containers share the same kernel with the host operating system, but have their own user space (application-level software).
Containers are portable and can be easily moved between hosts, without requiring a specific environment or dependencies.
Traditionally operating systems have a fixed combination of kernel and user space. Containers change the user space into a swappable component. This means that the entire user space portion of an operating system, including programs and custom configurations, can be independent of the host operating system.
Hardware virtualization vs OS-virtualization
Hardware virtualization
Operating system virtualization
Virtualization level (resource allocation)
Allocates hardware resources (CPU, RAM, disk, etc.) to virtual machines
Allocates OS resources (processes, memory, file systems, etc.) to isolated environments
Guest OS
Can run multiple, independent operating systems
Runs isolated environments in a single operating system with a shared kernel and hardware
Performance
Higher overhead due to separate operating systems
Lower overhead (no emulation, shared operating system)
Security
Provides strong isolation and security between virtual machines
Provides process-level isolation, but may not be as secure as hardware virtualization
Examples
VMware, KVM, Xen, etc.
Docker, Linux Containers, OpenVZ, etc.
Containerization architecture
Containerization architecture
Example: Web application deployment
Consider a web application that requires:
A web server (e.g. Apache or Nginx)
A database (e.g. MySQL or PostgreSQL)
A caching layer (e.g. Redis or Memcached)
Native approach
Install and run all components (web server, database, caching layer, and the web application itself) directly on the host operating system
Each component would run as a separate process, sharing the same kernel and system resources
What are the advantages and disadvantages?
No overhead from extra management of resources
Limited isolation and security between applications
Potential for resource conflicts and dependencies between applications
Complex installation, configuration and management
Virtualization approach
Create 3 separate virtual machines, each with its own operating system and application stack
Each VM would require its own allocation of resources (CPU, RAM, disk space)
It would result in significant overhead and resource waste, especially if each VM is not fully utilized
Containerization approach
Create 3 separate containers, each running a single application (web server, database, caching layer)
All containers share the same hardware resources and host operating system
no need for exclusive allocation of resources
all containers can use up to e.g. 90% of CPUs and memory when available
Containers can be restarted quickly
Results in significant reduction in overhead and resource waste, with improved efficiency and scalability
Comparison
In this example, containerization is a better choice than virtualization because:
The components are relatively lightweight and do not require full virtualization
The components can share the same kernel with the host operating system, reducing overhead and improving efficiency
Containerization provides the necessary isolation and flexibility for each component, without the overhead of virtualization
What is inside containers?
In principle, containers can contain anything , even the whole operating system.
However, containers can be designed to contain only files required to run just one application or even just one component of an application .
System containers – based on a snapshot of a whole operating system, similar to virtual machines
Example technologies: systemd-nspawn, LXC, OpenVZ
Application containers – based on a layered image providing only a specific application and its dependencies
Example technologies: Docker, Podman, Apptainer, Kubernetes
Use cases for system containers
Development environments : creating isolated development environments, allowing developers to work on different projects without affecting each other.
Legacy system support : system containers can be used to run legacy systems or applications that require a specific version of an operating system.
Exploring or testing various Linux distributions or various versions of some software
Running an application in a secure, isolated environment, when the benefits of application containers are not relevant
Use cases for application containers
Application containers are typically single-purpose environments, typically used for:
Web application deployment : Application containers provide a lightweight and efficient way to deploy web applications and services.
Microservices architecture : Application containers enable the deployment of multiple, isolated services that communicate with each other via some API.
DevOps and CI/CD : Application containers facilitate agile development, testing, and deployment of applications.
Cloud-native applications : Application containers are used to deploy cloud-native applications, which are designed to take advantage of cloud computing principles, such as scalability and on-demand resources.
Reproducibility with containers
Developers define the intended environment in a special definition file and when satisfied, they can push the results to a container registry:
Users can pull the image from the registry, run just 1 line of code, even years later, and get exactly the same working environment.
Limitations of containerization
While containerization offers many benefits, it also has some limitations:
Security : containers share the same kernel as the host operating system, which can pose security risks if not properly managed
Isolation : containers may not provide the same level of isolation as traditional virtual machines
Resource management : containers require careful resource management to ensure efficient utilization and prevent resource starvation
Open Container Initiative (OCI)
The Open Container Initiative (OCI) is a Linux Foundation project that aims to create a set of common, open standards for container formats and runtimes.
Standardize container formats: Ensure portability across various platforms and environments
Ensure interoperability: Allow containers to be used with different runtimes and tools
Promote innovation: Encourage development of new container-related technologies and tools
Specifications: OCI Image Format, OCI Runtime, and OCI Registry
Adoption: Widespread adoption by companies and organizations, growing ecosystem with new tools and technologies being developed
Architecture of application containers
The containerization architecture consists of:
Container runtime : the software that manages the creation, execution, and termination of containers
Examples: runc , crun (native), gVisor (sandbox), Kata Containers (VMs)
Networking backend : the software responsible for creating network interfaces, configuring IP addresses, and providing network isolation
Container engine : the software that enables the management of containers, images, volumes, networks
Image registry : a repository that stores and manages container images
Examples: Docker Hub, GitHub Container Registry, Amazon ECR
Essential characteristics of cloud services
Broad network access: accessible over the internet (or a private network), from anywhere and any device
On-demand self-service: automated provisioning and management, without the need for human intervention on the side of the provider.
Scalability: Cloud services are designed to quickly scale up or down to match changing business needs, without the need for significant upfront investment.
High availability: Cloud services are designed to provide high levels of uptime and availability, with built-in redundancy and failover capabilities.
Orchestration and Infrastructure-as-Code
Orchestration and Infrastructure-as-Code (IaC) are two related concepts that help organizations manage and automate their cloud infrastructure.
Orchestration : Automating the deployment, scaling, and management of cloud resources, such as virtual machines or containers.
Example tools: OpenStack (IaaS, virtual machines), Kubernetes (PaaS/SaaS, containers)
Infrastructure-as-Code (IaC) : Managing and provisioning cloud infrastructure through code rather than manual configuration, using tools such as Ansible, Terraform, CloudFormation, or Azure Resource Manager.
Benefits : Improved efficiency, optimized cost, reduced errors, improved reliability.
Orchestration examples
Orchestration examples
Orchestration examples
Orchestration examples
Orchestration examples
There are hundreds of cloud-native applications, but most container orchestration tools are modifications of Kubernetes.
See CNCF cloud native landscape .
Operating systems
Before we delve into the details of containerization, we need to talk about operating systems...
What are the main functions of an operating system?
What is your favorite operating system?
Which other operating systems do you know or use?
Desktop/laptop operating systems
Summary
Let's recap the most important concepts from the presentation:
What are IaaS , PaaS , SaaS ?
What is the difference between cloud and virtualization ?
What is the difference between virtualization and containerization ?
What are kernel space and user space in the context of Linux?
See Cloud Native Glossary for details.
Instead of using local servers or personal computing resources, cloud computing allows users and organizations to leverage a vast network of servers and services provided by third-party companies. These services can include storage, databases, software, and computing power, which can be scaled up or down as needed, often on a pay-as-you-go basis. This model offers flexibility, cost savings, and the ability to quickly adapt to changing business needs.
The term "cloud" in the context of computing is derived from the way networks and the internet were often depicted in diagrams and flowcharts. In these diagrams, the internet or a network was represented by a cloud-shaped symbol, symbolizing a vast, amorphous, and interconnected system that could be accessed from various points.
The term "cloud computing" itself began to gain prominence in the early 2000s, but the concept has roots that go back to the 1960s. The idea of delivering computing resources over a network was first proposed by J.C.R. Licklider in the 1960s, who was one of the pioneers of the ARPANET, the precursor to the modern internet. Licklider's vision was for everyone to be interconnected and accessing programs and data at any site, from anywhere.
However, the term "cloud computing" as we know it today started to be used more frequently in the early 2000s. One of the earliest known uses of the term in a commercial context was by Amazon in 2006, when they launched Amazon Web Services (AWS), which included the Elastic Compute Cloud (EC2). This service allowed users to rent virtual computers on which to run their own applications, marking a significant milestone in the development and popularization of cloud computing.
Since then, the term "cloud" has become widely adopted and is now a fundamental concept in the technology industry, representing a wide range of services and solutions that leverage the internet to provide scalable and flexible computing resources.
https://www.redhat.com/en/topics/cloud-computing/what-are-cloud-services
Note: Operating System Virtualization is also known as OS-level virtualization or containerization.
In summary, hardware virtualization provides a complete, virtualized hardware environment for each guest operating system, while operating system virtualization provides a lightweight, isolated environment for applications and services within a single operating system. The choice between hardware virtualization and operating system virtualization depends on the specific use case, performance requirements, and security needs.
Members:
- Founding members: Docker, CoreOS, Google, Microsoft, and Amazon.
- Other members: Over 50 companies and organizations, including Red Hat, IBM, and VMware.
The Open Container Initiative is an important step towards creating a standardized and interoperable container ecosystem, and its specifications and tools are widely adopted in the industry.
cloud is a methodology, but virtualization is a technology
# Advantages and criticism of cloud computing
Con: common people have lost the notion of data ownership
Con: danger of vendor lock-in (especially for end-users of various content delivery platforms)
TODO: web 3.0? (if relevant) https://en.wikipedia.org/wiki/History_of_the_World_Wide_Web#Web_3.0_and_Web3
---