Before mounting such filesystem, the kernel module must know all the devices either via preceding execution of btrfs device scan or using the device mount option. See section MULTIPLE DEVICES for more details.
See DUP PROFILES ON A SINGLE DEVICE for more.
A single device filesystem will default to DUP, unless a SSD is detected. Then it will default to single. The detection is based on the value of /sys/block/DEV/queue/rotational, where DEV is the short name of the device.
Note that the rotational status can be arbitrarily set by the underlying block device driver and may not reflect the true status (network block device, memory-backed SCSI devices etc). Use the options --data/--metadata to avoid confusion.
See DUP PROFILES ON A SINGLE DEVICE for more details.
The recommended size for the mixed mode is for filesystems less than 1GiB. The soft recommendation is to use it for filesystems smaller than 5GiB. The mixed mode may lead to degraded performance on larger filesystems, but is otherwise usable, even on multiple devices.
The nodesize and sectorsize must be equal, and the block group types must match.
versions up to 4.2.x forced the mixed mode for devices smaller than 1GiB. This has been removed in 4.3+ as it caused some usability issues.
Smaller node size increases fragmentation but leads to taller b-trees which in turn leads to lower locking contention. Higher node sizes give better packing and less fragmentation at the cost of more expensive memory operations while updating the metadata blocks.
versions up to 3.11 set the nodesize to 4k.
The default value is the page size and is autodetected. If the sectorsize differs from the page size, the created filesystem may not be mountable by the kernel. Therefore it is not recommended to use this option unless you are going to mount it on a system with the appropriate page size.
This option may enlarge the image or file to ensure it’s big enough to contain the files from rootdir. Since version 4.14.1 the filesystem size is not minimized. Please see option --shrink if you need that functionality.
If the destination is a regular file, this option will also truncate the file to the minimal size. Otherwise it will reduce the filesystem available space. Extra space will not be usable unless the filesystem is mounted and resized using btrfs filesystem resize.
prior to version 4.14.1, the shrinking was done automatically.
-O|--features <feature1>[, <feature2>...]
See section FILESYSTEM FEATURES for more details. To see all available features that mkfs.btrfs supports run:
mkfs.btrfs -O list-all
There is typically no action needed from the user. On a system that utilizes a udev-like daemon, any new block device is automatically registered. The rules call btrfs device scan.
The same command can be used to trigger the device scanning if the btrfs kernel module is reloaded (naturally all previous information about the device registration is lost).
Another possibility is to use the mount options device to specify the list of devices to scan at the time of mount.
# mount -o device=/dev/sdb,device=/dev/sdc /dev/sda /mnt
that this means only scanning, if the devices do not exist in the system, mount will fail anyway. This can happen on systems without initramfs/initrd and root partition created with RAID1/10/5/6 profiles. The mount action can happen before all block devices are discovered. The waiting is usually done on the initramfs/initrd systems.
As of kernel 4.14, RAID5/6 is still considered experimental and shouldn’t be employed for production use.
mixed data and metadata block groups, also set by option --mixed
increased hardlink limit per file in a directory to 65536, older kernels supported a varying number of hardlinks depending on the sum of all file name sizes that can be stored into one metadata block
extended format for RAID5/6, also enabled if raid5 or raid6 block groups are selected
reduced-size metadata for extent references, saves a few percent of metadata
improved representation of file extents where holes are not explicitly stored as an extent, saves a few percent of metadata if sparse files are used
Other terms commonly used:
block group, chunk
A typical size of metadata block group is 256MiB (filesystem smaller than 50GiB) and 1GiB (larger than 50GiB), for data it’s 1GiB. The system block group size is a few megabytes.
|DUP||2 / 1 device||1/any ^(see note 1)|
|RAID0||1 to N||2/any|
|RAID10||2||1 to N||4/any|
|RAID5||1||1||2 to N - 1||2/any ^(see note 2)|
|RAID6||1||2||3 to N - 2||3/any ^(see note 3)|
It’s not recommended to build btrfs with RAID0/1/10/5/6 profiles on partitions from the same device. Neither redundancy nor performance will be improved.
Note 1: DUP may exist on more than 1 device if it starts on a single device and another one is added. Since version 4.5.1, mkfs.btrfs will let you create DUP on multiple devices.
Note 2: It’s not recommended to use 2 devices with RAID5. In that case, parity stripe will contain the same data as the data stripe, making RAID5 degraded to RAID1 with more overhead.
Note 3: It’s also not recommended to use 3 devices with RAID6, unless you want to get effectively 3 copies in a RAID1-like manner (but not exactly that). N-copies RAID1 is not implemented.
For example, a SSD drive can remap the blocks internally to a single copy—thus deduplicating them. This negates the purpose of increased redundancy and just wastes filesystem space without providing the expected level of redundancy.
The duplicated data/metadata may still be useful to statistically improve the chances on a device that might perform some internal optimizations. The actual details are not usually disclosed by vendors. For example we could expect that not all blocks get deduplicated. This will provide a non-zero probability of recovery compared to a zero chance if the single profile is used. The user should make the tradeoff decision. The deduplication in SSDs is thought to be widely available so the reason behind the mkfs default is to not give a false sense of redundancy.
As another example, the widely used USB flash or SD cards use a translation layer between the logical and physical view of the device. The data lifetime may be affected by frequent plugging. The memory cells could get damaged, hopefully not destroying both copies of particular data in case of DUP.
The wear levelling techniques can also lead to reduced redundancy, even if the device does not do any deduplication. The controllers may put data written in a short timespan into the same physical storage unit (cell, block etc). In case this unit dies, both copies are lost. BTRFS does not add any artificial delay between metadata writes.
The traditional rotational hard drives usually fail at the sector level.
In any case, a device that starts to misbehave and repairs from the DUP copy should be replaced! DUP is not backup.
The combination of small filesystem size and large nodesize is not recommended in general and can lead to various ENOSPC-related issues during mount time or runtime.
Since mixed block group creation is optional, we allow small filesystem instances with differing values for sectorsize and nodesize to be created and could end up in the following situation:
# mkfs.btrfs -f -n 65536 /dev/loop0 btrfs-progs v3.19-rc2-405-g976307c See http://btrfs.wiki.kernel.org for more information.
Performing full device TRIM (512.00MiB) ... Label: (null) UUID: 49fab72e-0c8b-466b-a3ca-d1bfe56475f0 Node size: 65536 Sector size: 4096 Filesystem size: 512.00MiB Block group profiles: Data: single 8.00MiB Metadata: DUP 40.00MiB System: DUP 12.00MiB SSD detected: no Incompat features: extref, skinny-metadata Number of devices: 1 Devices: ID SIZE PATH 1 512.00MiB /dev/loop0
# mount /dev/loop0 /mnt/ mount: mount /dev/loop0 on /mnt failed: No space left on device
The ENOSPC occurs during the creation of the UUID tree. This is caused by large metadata blocks and space reservation strategy that allocates more than can fit into the filesystem.