User applications can access the zone information of a zoned block device and can manage the zones of a zoned block device by using two types of interfaces.
sysfs attribute files, accessible either directly from applications as regular files or from scripted languages (shell scripts, python, etc).
ioctl() system calls, suitable for use from C programs or other programming languages that have an equivalent system-call binding.
The sysfs files and ioctl() commands available to applications have evolved since the introduction of zoned block device support in Kernel 4.10. The availability of files and commands per kernel version is detailed in the following sections.
Programs that use scripting languages (e.g. bash scripts) can access zoned device information through sysfs attribute files. The attribute files provided are shown in the following table.
|/sys/block/dev name/queue/zoned||4.10.0||Device zoned model|
|/sys/block/dev name/queue/chunk_sectors||4.10.0||Device zone size|
|/sys/block/dev name/queue/nr_zones||4.20.0||Total number of zones|
|/sys/block/dev name/queue/zone_append_max_bytes||5.8.0||Maximum size in bytes of a zone append write operation|
|/sys/block/dev name/queue/max_open_zones||5.9.0||Maximum number of open zones|
|/sys/block/dev name/queue/max_active_zones||5.9.0||Maximum number of active zones|
The zone model of a zoned device can be discovered by using the
queue attribute file. For example: for a zoned block device named sdb, the
following shell command displays the device zoned model.
The possible values of the zoned attribute file are shown in the table below.
|none||Regular block device, including drive-managed SMR disks|
|host-aware||Host-aware device model|
|host-managed||Host-managed device model|
The device zone size can be read from the sysfs queue attribute file that is
chunk_sectors. For a device named sdb (the same device as in the
previous example), the following command gives the device zone size.
The value is displayed as a number of 512B sectors, regardless of the actual logical and physical block size of the device. In this example, the device zone size is 524288 x 512 = 256 MiB.
The sysfs queue attribute file nr_zones was introduced in Linux kernel version 4.20.0, and is available to obtain the total number of zones of a zoned device.
This attribute value is always 0 for a regular block device.
The C header file
/usr/include/linux/blkzoned.h contains macro definitions and
data structure definitions that allow application developers to obtain
information about zoned block devices and to manage the zones of the devices.
The data structure
struct blk_zone defines a zone-descriptor structure that
contains a complete description of a zone: this includes the zone's location on
the device, the zone type, its condition (state), and the position of the zone
write pointer (for sequential zones). For kernels Up to kernel version 5.8, this
data structure is as shown below.
As indicated in the comments to this data structure definition, the unit that is used to indicate (1) the zone start position, (2) the zone's size and (3) the write pointer position is 512B sector size. This holds true regardless of the actual logical block size of the device. Even for a device with a 4KB logical sector, the above zone descriptor fields use a 512-byte sector size unit.
The capacity field was added to struct blk_zone in kernel version 5.9. With this change, the data structure is as follows.
The capacity field indicates the usable zone capacity of a zone in units of 512B sectors. The presence, or validity, of this field within the structure is indicated using a zone report flag. See Obtaining Zone Information below for details.
type field of a zone descriptor can have only one of the values defined by
cond field of the
struct blkzone data structure defines the current
condition of a zone. The possible condition (state) values of this field are
defined by the
Under a device's normal operation, some of these conditions cannot result
directly from host-initiated operations. These conditions are
BLK_ZONE_COND_READONLY. They can be set only by
the device itself, to indicate zones with capabilities that have been limited by
a hardware defect.
Transitions to other conditions result from user operations, either write operations or zone management commands. Zone management commands can be issued by an application using the kernel ioctl() interface (see ioctl() Commands.
The SCSI Zoned Block Command specification (ZBC), the ATA Zoned Device ATA Command Set specification (ZAC) and the NVM Express Zoned Namespace Command Set specification (ZNS) define a zone condition state machine governing the possible transitions of a zone from one condition to another depending on the commands executed.
Several ioctl() commands are defined to manipulate and obtain information and manipulate the zones of a zoned block device. All supported commands are shown below.
Not all commands are available on all kernel versions. The following table shows the kernel version that introduced each command.
|BLKREPORTZONE||4.10.0||Get zone information|
|BLKRESETZONE||4.10.0||Reset a zone write pointer|
|BLKGETZONESZ||4.20.0||Get a device zone size|
|BLKGETNRZONES||4.20.0||Get the total number of zones of a device|
|BLKOPENZONE||5.5.0||Explicitly open a zone|
|BLKCLOSEZONE||5.5.0||Close a zone|
|BLKFINISHZONE||5.5.0||Finish a zone|
The BLKREPORTZONE command allows an application to obtain a device's zone
information in the form of an array of zone descriptors. The data argument
that is passed to the
ioctl() must be the address of a memory area large
enough to store one
struct blk_zone_report header structure, followed by an
array of zone descriptors.
The zone report header structure
blk_zone_report is as shown below.
The header indicates the 512-byte sector from which the report should start as
well as the number of zone descriptors in the array following the header. A
typical use of the
BLKREPORTZONE command to obtain information on all the
zones of a device is as shown below.
The number of zone descriptors obtained is returned to the user in the
nr_zones field of the report header structure
With the introduction of zone capacity support for NVMe Zoned Namepsaces in
kernel version 5.9, zone descriptors gained the
capacity field. The presence
of this field is indicated by the new
flag field added to
flags field of
struct blk_zone_report has the flag
BLK_ZONE_REP_CAPACITY set, then the zone descriptor's structure will have a
valid value set in the
capacity field of
sturct blk_zone. Otherwise, this
field can be ignored as it will show a value of 0.
The example code below, extracted from the code of the libzbd library, illustrates how applications can implement backward-compatible support for zone capacity information by using the autotools build environment.
With this method, the main code responsible for issuing and parsing zone reports
always has access to the
capacity field of
struct blk_zone regardless of
the kernel version the code is executed on. For kernels before kernel version
5.9, the zone capacity field will always be equal to 0, meaning that the zone
capacity should be ignored and that the zone size should be used in its place.
Different coding techniques can also be used to always return a zone capacity
equal to the zone size for kernels lacking support for this field.
The command line utility
is part of the util-linux project, uses the BLKREPORTZONE command to
implement its report function. Its code was modified similarly to the above
method to allow its correct compilation and execution regardless of the version
of the kernel being used.
The write pointer of a single sequential zone or of a range of contiguous
sequential zones can be reset using the
BLKRESETZONE command. Resetting a
sequential zone write pointer position will also transition the zone to the
Empty condition (
The range of zones to reset is defined using the data structure
sector field must specify the start sector of the first zone to reset. The
nr_sectors field specifies the total length of the range of zones to reset.
This length must be at least as large as one zone.
As indicated in comments describing the
blk_zone_range structure, the commands
BLKFINISHZONE also use this data structure
to define the range of zones on which the command will operate.
The following code shows an example use of the
BLKRESETZONE command to reset a
single zone starting at sector 274726912 with a zone size of 256 MiB (524288
sectors of 512B).
The device file descriptor
fd must be open for writing in order for this
command to succeed.
The command line utility
the BLKRESETZONE command to implement its reset functionality.
Explicitly opening a zone or a range of zones can be done using the
BLKOPENZONE command. This command uses the same arguments as the
BLKRESETZONE command. It takes a pointer to a data structure
which specifies the range of zones to operate on.
Closing a zone is done using the command BLKCLOSEZONE. Finishing a zone--that
is, transitioning the zone to the full condition (
done using the BLKFINISHZONE command. Both of these commands also take as
arguments a pointer to the
blk_zone_range data structure to specify the range
of zones to operate on.
The BLKOPENZONE, BLKCLOSEZONE and BLKFINISHZONE commands were introduced in kernel version 5.5.0.
Linux® kernel version 4.20 introduced two new commands: one to obtain
a zoned device's zone size (
BLKGETZONESZ), and one to obtain the total number
of zones of the device (
BLKGETNRZONES). Both commands take a pointer to an
unsigned 32-bit integer variable as an argument, and the zone-size value or the
number of zones will be returned. The following sample C code illustrates the
use of these commands.
BLKGETNRZONES is especially useful for allocating an array of zone
descriptors large enough for a zone report on all the zones of a device.