Skip to content

Zoned Block Device User Interface

User applications can access zone information and manage zones of a zoned block device using two types of interfaces.

  1. ioctl() system calls suitable for use from C programs

  2. sysfs attribute files accessible either directly from C programs as regular files or scripted languages (shell scripts, python, etc).

ZBD C Application Programming Interface

The C header file /usr/include/linux/blkzoned.h contains macro definitions and data structure definitions allowing an application developer to retrieve zoned disk zone information (a list of zone descriptors) and to control zone write pointer.

Zone Descriptor

A zone descriptor completely describes a zone information: the zone location on disk, the zone type, the zone condition (state), and the position of the zone write pointer for sequential zones.

The data structure type struct blk_zone is used to define a zone descriptor.

/**
 * struct blk_zone - Zone descriptor for BLKREPORTZONE ioctl.
 *
 * @start: Zone start in 512 B sector units
 * @len: Zone length in 512 B sector units
 * @wp: Zone write pointer location in 512 B sector units
 * @type: see enum blk_zone_type for possible values
 * @cond: see enum blk_zone_cond for possible values
 * @non_seq: Flag indicating that the zone is using non-sequential resources
 *           (for host-aware zoned block devices only).
 * @reset: Flag indicating that a zone reset is recommended.
 * @reserved: Padding to 64 B to match the ZBC/ZAC defined zone descriptor size.
 *
 * start, len and wp use the regular 512 B sector unit, regardless of the
 * device logical block size. The overall structure size is 64 B to match the
 * ZBC/ZAC defined zone descriptor and allow support for future additional
 * zone information.
 */
struct blk_zone {
        __u64   start;          /* Zone start sector */
        __u64   len;            /* Zone length in number of sectors */
        __u64   wp;             /* Zone write pointer position */
        __u8    type;           /* Zone type */
        __u8    cond;           /* Zone condition */
        __u8    non_seq;        /* Non-sequential write resources active */
        __u8    reset;          /* Reset write pointer recommended */
        __u8    reserved[36];
};
As indicated in the comments to this data structure definition, the unit that is used to represent zone start position, size and write pointer position is 512B sector size, regardless of the actual logical block size of the device. Even for 4K disks, the above zone descriptor fields are measured in 512 byte sectors.

Zone Type

The zone descriptor type field can have any of the values defined by the enumeration enum blk_zone_type.

/**
 * enum blk_zone_type - Types of zones allowed in a zoned device.
 *
 * @BLK_ZONE_TYPE_CONVENTIONAL: The zone has no write pointer and can be writen
 *                              randomly. Zone reset has no effect on the zone.
 * @BLK_ZONE_TYPE_SEQWRITE_REQ: The zone must be written sequentially
 * @BLK_ZONE_TYPE_SEQWRITE_PREF: The zone can be written non-sequentially
 *
 * Any other value not defined is reserved and must be considered as invalid.
 */
enum blk_zone_type {
        BLK_ZONE_TYPE_CONVENTIONAL      = 0x1,
        BLK_ZONE_TYPE_SEQWRITE_REQ      = 0x2,
        BLK_ZONE_TYPE_SEQWRITE_PREF     = 0x3,
};

This enumeration lists all possible zone types defined by the ZBC and ZAC standards. The enumeration comments describe each zone type.

Zone Condition

The cond field of the struct blkzone data structure defines the current condition of a zone. All possible condition (state) values of this field are defined by the blk_zone_cond enumeration.

/**
 * enum blk_zone_cond - Condition [state] of a zone in a zoned device.
 *
 * @BLK_ZONE_COND_NOT_WP: The zone has no write pointer, it is conventional.
 * @BLK_ZONE_COND_EMPTY: The zone is empty.
 * @BLK_ZONE_COND_IMP_OPEN: The zone is open, but not explicitly opened.
 * @BLK_ZONE_COND_EXP_OPEN: The zones was explicitly opened by an
 *                          OPEN ZONE command.
 * @BLK_ZONE_COND_CLOSED: The zone was [explicitly] closed after writing.
 * @BLK_ZONE_COND_FULL: The zone is marked as full, possibly by a zone
 *                      FINISH ZONE command.
 * @BLK_ZONE_COND_READONLY: The zone is read-only.
 * @BLK_ZONE_COND_OFFLINE: The zone is offline (sectors cannot be read/written).
 *
 * The Zone Condition state machine in the ZBC/ZAC standards maps the above
 * deinitions as:
 *   - ZC1: Empty         | BLK_ZONE_EMPTY
 *   - ZC2: Implicit Open | BLK_ZONE_COND_IMP_OPEN
 *   - ZC3: Explicit Open | BLK_ZONE_COND_EXP_OPEN
 *   - ZC4: Closed        | BLK_ZONE_CLOSED
 *   - ZC5: Full          | BLK_ZONE_FULL
 *   - ZC6: Read Only     | BLK_ZONE_READONLY
 *   - ZC7: Offline       | BLK_ZONE_OFFLINE
 *
 * Conditions 0x5 to 0xC are reserved by the current ZBC/ZAC spec and should
 * be considered invalid.
 */
enum blk_zone_cond {
        BLK_ZONE_COND_NOT_WP    = 0x0,
        BLK_ZONE_COND_EMPTY     = 0x1,
        BLK_ZONE_COND_IMP_OPEN  = 0x2,
        BLK_ZONE_COND_EXP_OPEN  = 0x3,
        BLK_ZONE_COND_CLOSED    = 0x4,
        BLK_ZONE_COND_READONLY  = 0xD,
        BLK_ZONE_COND_FULL      = 0xE,
        BLK_ZONE_COND_OFFLINE   = 0xF,
};

Some of the conditions that are defined cannot be obtained with a host initiated operation. These conditions are BLK_ZONE_COND_OFFLINE and BLK_ZONE_COND_READONLY which can only be entered by the device itself.

The condition BLK_ZONE_COND_EXP_OPEN, or explicit open, is the result of a successful execution of an OPEN ZONE command (see Zone Block Commands. Since the OPEN ZONE command is not supported by the kernel ZBD interface, a zone can be transitioned in the explicit open zone condition only by using direct device access, that is, issuing the SCSI OPEN ZONE command through the SG_IO interface (using libzbc, libzbc zbc_open_zone utility or the sg_zone utility).

Zone Operations

ioctl() commands are defined to manipulate a device zone and obtain zone information.

/**
 * Zoned block device ioctl's:
 *
 * @BLKREPORTZONE: Get zone information. Takes a zone report as argument.
 *                 The zone report will start from the zone containing the
 *                 sector specified in the report request structure.
 * @BLKRESETZONE: Reset the write pointer of the zones in the specified
 *                sector range. The sector range must be zone aligned.
 * @BLKGETZONESZ: Get the device zone size in number of 512 B sectors.
 * @BLKGETNRZONES: Get the total number of zones of the device.
 */
#define BLKREPORTZONE   _IOWR(0x12, 130, struct blk_zone_report)
#define BLKRESETZONE    _IOW(0x12, 131, struct blk_zone_range)
#define BLKGETZONESZ    _IOR(0x12, 132, __u32)
#define BLKGETNRZONES   _IOR(0x12, 133, __u32)

Getting Zone Descriptors: BLKREPORTZONE ioctl

The BLKREPORTZONE operation allows an application to obtain a device zone information report. The data argument passed to the ioctl() must be the address of an area in memory that is large enough to store one struct blk_zone_report header followed by an array of zone descriptors.

The zone report header structure blk_zone_report is as shown below.

/**
 * struct blk_zone_report - BLKREPORTZONE ioctl request/reply
 *
 * @sector: starting sector of report
 * @nr_zones: IN maximum / OUT actual
 * @reserved: padding to 16 byte alignment
 * @zones: Space to hold @nr_zones @zones entries on reply.
 *
 * The array of at most @nr_zones must follow this structure in memory.
 */
struct blk_zone_report {
        __u64           sector;
        __u32           nr_zones;
        __u8            reserved[4];
        struct blk_zone zones[0];
};

The header indicates the sector from which the report should start and the number of zone descriptors in the array following the header. A typical use of the BLKREPORTZONE command to obtain information on all the zones of the device is as shown below.

#include <stdlib.h>
#include <linux/blkzoned.h>

struct blk_zone_report *hdrlinux/blkzoned.h>;
unsigned long long start_sector = 0;
size_t hdr_len;
int nr_zones = 256;

hdr_len = sizeof(struct blk_zone_report) + nr_zones * sizeof(struct blkzone);
hdr = malloc(hdr_len);
if (!hdr)
    return -ENOMEM;

while (1) {

    hdr->sector = start_sector;
    hdr->nr_zones = nr_zones;

    ret = ioctl(fd, BLKREPORTZONE, hdr);
    if (ret)
        goto error;

    if (!hdr->nr_zones) {
        /* Done */
        break;
    }

    printf("Got %u zone descriptors\n", hdr->nr_zones);
    ...

    /* The next report must start after the last zone reported */
    start_sector = hdr->zones[hdr->nr_zones - 1].start +
               hdr->zones[hdr->nr_zones - 1].len;

}

The number of zone descriptors obtained is returned to the user using the nr_zones field of the report header structure blk_zone_report.

Zone Reset: BLKRESETZONE ioctl

The write pointer of a single sequential zone or of a range of contiguous sequential zones can be reset to the start sector of the zones using the BLKRESETZONE command. Resetting a sequential zone write pointer position will also transition the zone to Empty condition.

The range of zones to reset is defined using the data structure blk_zone_range shown below.

/**
 * struct blk_zone_range - BLKRESETZONE ioctl request
 * @sector: starting sector of the first zone to issue reset write pointer
 * @nr_sectors: Total number of sectors of 1 or more zones to reset
 */
struct blk_zone_range {
        __u64           sector;
        __u64           nr_sectors;
};

The sector field must specify the start sector of the first zone to reset. The nr_sectors field specifies the total length of the range of zones to reset. This length must be at least as large as one zone.

The following code shows an example use of the BLKRESETZONE command to reset a single zone starting at sector 274726912 with a zone size of 256 MiB (524288 sectors of 512B).

#include <linux/blkzoned.h>

struct blk_zone_range zrange;
int ret;

zrange.sector = 274726912;
zrange.nr_sectors = 524288;

ret = ioctl(fd, BLKRESETZONE, &zrange);
if (ret)
    goto error;
...

The disk file descriptor fd must be open for writing for this command to succeed.

Zone Size and Number of Zones

Linux® kernel version 4.20 introduced two new additional command to obtain a zoned device zone size (BLKGETZONESZ) and the total number of zones of the disk (BLKGETNRZONES). Both commands take a pointer to an unsigned 32-bits integer variable as argument. The following sample C code illustrates the use of these commands.

#include <linux/blkzoned.h>
#include <stdio.h>

unsigned int nr_zones, zone_size;
int ret;

ret = ioctl(fd, ,BLKGETZONESZ, &zone_size);
if (ret)
    goto error;
ret = ioctl(fd, ,BLKGETNRZONES, &nr_zones);
if (ret)
    goto error;

printf("Disk has %u zones of %u 512-Bytes sectors\n",
       nr_zones, zone_size);
...

The command BLKGETNRZONES is especially useful to allocate an array of zone descriptors large enough for a zone report over the entire disk.

Sysfs Interface

Programs using script languages (e.g. bash scripts) can also access a zoned device information through sysfs attribute files.

For instance, the zone model of a zoned device can be discovered using the zoned queue attribute.

# cat /sys/block/sdb/queue/zoned
host-managed

The possible values of the zoned attribute are shown in the table below.

Value Description
none Regular disk or drive managed disk
host-aware Host Aware disk model
host-managed Host Managed disk model

If needed, disk zone size can be read from the sysfs attribute chunk_sectors.

# cat /sys/block/sdb/queue/chunk_sectors
524288

The value is displayed as a number of 512B sectors, regardless of the actual logical and physical block size of the disk. In this example, the disk zone size is 524288 x 512 = 256 MiB.

Starting with Linux kernel version 4.20.0, the sysfs attribute nr_zones is available to obtain the total number of zones on the disk.

# cat /sys/block/sdb/queue/nr_zones
55880

To obtain detailed information on each zone of the device, scripts can use the command line utilities blkzone report, sg3_utilssg_rep_zone and libzbc zbc_report_zones.

To reset zones write pointers, the command line utilities blkzone reset, sg3_utils sg_reset_wp and libzbc zbc_reset_zone are available.