Virtualization and cloud technologies

Jakub Klinkovský

:: Czech Technical University in Prague
:: Faculty of Nuclear Sciences and Physical Engineering
:: Department of Software Engineering

Academic year 2024-2025

Linux

  • From Wikipedia:

    Linux is a family of open-source Unix-like operating systems based on the Linux kernel an operating system kernel first released in 1991 by Linus Torvalds.

  • A Linux distribution ("distro") is a complete operating system, which includes the Linux kernel and supporting system software and libraries, most of which are provided by third parties.
    center

Kernel space and user space

From Wikipedia:

Linux features

We will not talk about:

  • boot process and operating system initialization
  • most of the system components and their interfaces

But some features are very important to understand how containers work and how to use them efficiently. We will talk about:

  • differences between Linux and other operating systems
  • Linux kernel features related to operating system virtualization
  • package management
  • common programs and commands

Let's start...

  • Who started Windows on their computer? 😱

  • How to log in to Alma Linux? (Created in 2021, so not in the previous timeline.)

  • We will need only 3 things: web browser, text editor, and terminal.

  • How to install VSCode/VSCodium?

  • How to use the terminal?

  • Where are the files stored? How to access them from home?

Files in Linux

Linux follows the Unix philosophy: everything is a file, including special file types:

  • Regular files
  • Directories: files that contain other files
  • Symbolic links: pointers to other files (of any type)
  • Block devices: files that represent storage devices, such as hard drives
  • Character devices: files that representing serial ports, keyboards, and other character-based input devices
  • FIFOs (Named Pipes): inter-process communication channels
  • Sockets: Network connection endpoints

Details: File Types in Linux Explained

Linux file system hierarchy

  • Primarily designed as a tree: / is the root directory, /usr/ is a subdirectory, then /usr/bin/ and /usr/lib/ other subdirectories, etc.
    See file-hierarchy(7) for details and conventions.
  • But it is not exactly a tree:
    • Mount points – file systems can be mounted at arbitrary points
    • Symbolic links can create cycles and shortcuts between directories
    • Hard links allow multiple names for the same (regular) file, creating multiple paths to the same data
    • Bind mounts allow mounting an arbitrary subtree (directory) at a different point, creating multiple entrances to the same subtree (directory)

What is a file system?

A file system is a component of an operating system that provides a way to store, organize, and manage files on some storage device.

  • It is an implementation detail for the abstraction exposed by the Linux kernel (users and programs can interact with storage devices in terms of files rather than bytes and their addresses)
  • Different file systems have different features and use cases
  • Common features: data integrity and consistency assurance
    (journaling, checksums, error correction)
  • Advanced features: copy-on-write (CoW) (enables efficient snapshotting and versioning), compression, encryption, deduplication

File system overlay

A file system overlay is a technique used to combine two or more file systems into a single, unified view. It allows multiple file systems to be layered on top of each other, with the topmost layer being the one that is visible to the user and applications.

Example from containerization:

  1. Base layer: a read-only layer containing a snapshot of some Linux distribution
  2. Container layer: a writable layer created on top of the base layer when the container is started. This is where changes made by the container are stored.
  3. An overlay mount combines the two layers and exposes them to the container as a single file system. It will be mounted at / inside the container.
  4. When the container is restarted, the container layer is deleted.

Users and groups

Users and groups provide the basic means for access control on Linux:

  • Users: Accounts that can login and access system resources, identified by a unique username and UID (User ID)
    • Common users (people): interactive accounts for humans
    • System users (services): accounts for system services and daemons
  • Groups: Collections of users that share similar permissions and access rights, identified by a unique groupname and GID (Group ID)
  • UID and GID: Numerical identifiers used by the system to manage user and group permissions, with 0 reserved for the root user and group
  • Information about users and groups is stored in /etc/passwd and /etc/group
  • Related commands: sudo (superuser do) and su (substitute user)

Permissions in a file system

File systems use permissions and attributes to regulate the level of interaction that system processes can have with files and directories.

  • File permissions: Read, write, and execute permissions for owner, group, and other (run ls -l and check the output)
  • Setuid, setgid, and sticky bit: Special permission bits that modify the behavior of files and directories
  • Access Control Lists (ACLs): Fine-grained permissions that can be assigned arbitrarily to specific users and groups
  • Attributes and extended attributes: Enable further customization of file operations, such as immutability, compression, or copy-on-write
  • Related commands: chmod, chown, getfacl, setfacl, lsattr, chattr

Operating system resources

Linux provides various resources to user space programs:

  • processes, memory space, CPU time, I/O time, and privileges

Each process is associated with several key attributes that define its identity, permissions, and behavior. Here are some of the most important ones:

  • PID (Process ID): a unique numerical identifier assigned to each process
  • UID (User ID): the user ID of the process owner
  • GID (Group ID): the group ID of the process owner
  • Supplementary GIDs: a list of additional GIDs that the process is a member of
  • Capabilities: a set of fine-grained privileges that the process has been granted, which can provide specific access rights without granting full "root access".

Control groups (cgroups)

Control groups are a Linux kernel feature that allows for resource management:

  • Resource control: cgroups limit and account for CPU, memory, I/O, and network resources used by a group of processes
  • Process grouping: cgroups organize processes into hierarchical groups, allowing for flexible resource management
  • Features:
    • Resource limiting – e.g. maximum memory limit or CPU quota
    • Prioritization – allowing larger share of CPU utilization or disk I/O throughput
    • Accounting – measuring the group's resource usage
    • Control – e.g. freezing groups of processes, checkpointing and restarting

Linux namespaces

Namespaces are a feature of the Linux kernel that partition kernel resources such that one set of processes sees one set of resources, while another set of processes sees a different set of resources.

  • Namespace kinds: mount (mnt), process ID (pid), network (net), inter-process communication (ipc), user ID (user), control group (cgroup), UTS, time
  • Namespaces facilitate both resource identification and privilege isolation
  • Linux system begins with a single namespace of each type, used by all processes. Processes can then create additional namespaces and join different namespaces.
  • Namespaces are a required aspect of functioning containers in Linux.

In a file system overlay, each layer is a separate file system, and changes made to the topmost layer do not affect the underlying layers. This allows for a number of use cases, such as: * **Live updates**: A new version of a file system can be mounted as an overlay on top of the existing file system, allowing for live updates without disrupting the running system. * **Versioning**: Multiple versions of a file system can be maintained, with each version being a separate layer in the overlay. * **Testing and development**: A test or development environment can be created as an overlay on top of a production file system, allowing for testing and development without affecting the production environment. * **Read-only base image**: A read-only base image can be used as the bottom layer, with a writable overlay on top, allowing for customization and modification of the file system without modifying the base image. File system overlays are commonly used in Linux and other Unix-like operating systems, and are supported by file systems such as OverlayFS, AuFS, and UnionFS. Some key benefits of file system overlays include: * **Flexibility**: File system overlays provide a flexible way to manage multiple file systems and versions. * **Efficiency**: File system overlays can reduce storage requirements by allowing multiple layers to share the same underlying storage. * **Isolation**: File system overlays provide isolation between layers, allowing for testing and development without affecting the production environment. However, file system overlays can also introduce additional complexity, and may require additional management and maintenance to ensure data consistency and integrity.

* EUID (Effective User ID): ID used to determine the process's permissions and access rights (can be different from UID if the process has been started with elevated privileges, e.g., using `sudo`). * EGID (Effective Group ID): The effective group ID of the process, which is used to determine the process's group membership and access rights.

* **Tools**: cgroups are managed using tools like `cgcreate`, `cgset`, and `cgexec`, as well as through APIs and libraries like libcgroup and systemd