Common Performance Numbers

2020-03-07

system design

The performance a computer system is related to its physical properties. Knowing some of those common performance numbers helps us develop an intuition for the hardware. With that, we are able to evaluate a system design without first building it. It also allows us to think more thoroughly when making decisions for system design.

CPU

When talking about the performance of CPUs, we often mention frequencies. For example, a processor of 2 GHz frequency indicates that one single cycle requires 0.5 ns (nanoseconds).

$$ \frac {1 s \times 10^9} {2 GHz \times 10^9} = 0.5 ns $$

The basic execution unit of CPUs are instructions. The execution of one instruction usually requires multiple cycles. For a 2 GHz processor with a basic four-stage pipeline (fetch, decode, execute and write-back), ideally an instruction takes 4 cycles (2 ns). We thus use CPI (clock cycles per instruction) as one indicator of a processor’s performance.

While one single instruction requires multiple cycles, a processor is able to execute more than one instruction within a cycle thanks to the instruction pipelining. Accordingly, we use IPC (instructions per cycle) as another indicator of a processor’s performance. A similar metric based on seconds is also defined as MIPS (million instructions per second). With 1 IPC, a 2 GHz processor obtains 2000 MIPS, though in practice it’s usually lower than that.

$$ MIPS = \frac {IPC \times frequency} {10^6} $$

To accelerate the execution, processors often speculatively execute instructions for branches. On the other hand, the penalty of misprediction is high, which needs to restart from the fetch stage with the correct instruction. Modern processors tend to have long pipelines such that a misprediction can result in at least 10 cycles, which is 5 ns for a 2 GHz processor.

Memory hierarchy

Memory units are separated into levels across the computer architecture. A memory hierarchy is designed to balance a trade-off between the capacity and speed, where the memory unit with a smaller capacity exhibits a higher speed and vice versa. Below is a summary of the typical memory hierarchy based on a 2 GHz processor. While permanent storage like disks also belongs to a memory hierarchy, we will discuss it in a separate section.

Hierarchy	Capacity	Latency	Bandwidth
Registers	8 Bytes per register	1 cycle / 0.5 ns
L1 cache	64 KB	4 cycles / 2 ns	700 GB/s
L2 cache	256 KB	12 cycles / 6 ns	200 GB/s
L3 cache	2 MB	50 cycles / 25 ns	100 GB/s
DRAM (local)	2 GB	200 cycles / 100 ns	10 GB/s
DRAM (remote)	2 GB	400 cycles / 200 ns

Operating system operations

Some interesting operating system operations:

Mutex locks–at least 10 ns.
Context switch–thousands of cycles and in the magnitude of microseconds.

Network

Fiber-optic cables transfer data under an upper bound of the speed of light. The speed is typically at 200,000 km/s, resulting in a latency of 5 us (microseconds) per kilometer. A network transfer latency is determined by distances. Below are some examples with various distances.

Environment	Distance	RTT (round trip time)
Within a rack	6 feet	0.1 ms
Within a data center	300 feet	0.5 ms
Intra-state	60 miles	1 ms
Inter-state	600 miles	10 ms
Global	6000 miles	100 ms

Storage

The performance of hard disk drives (HDDs) are related to their mechanical nature:

Seek time–the head has to first move to the target track.
Rotational latency–the head then waits for the target sector to pass underneath it as the platter rotates.

The specific seek time and rotational latency depend on where the head is and the position of the platter rotation at the start of an operation. Because of that, they are usually measured by average values. Most HDDs for consumers spin at 5,400 or 7,200 RPM (revolutions per minute), resulting in a latency of 11 ms or 8 ms (5.5 ms or 4 ms on average). HDDs for high-performance servers can reach 15,000 RPM, resulting in 4 ms (2 ms on average) to complete a revolution.

The mechanical nature of HDDs also leads to different performance between sequential and random I/Os. A sequential I/O, where data blocks are placed contiguously on the same track, incurs no additional wait time besides the first seek time and rotational latency, and thus can be faster than a random I/O.

Solid state drives (SSDs) are completely different than HDDs. The basic storage unit of an SSD are NAND cells. Each cell stores a few bits of data. Cells are grouped into pages (typically 2 KB or 4 KB data per page). Pages are further grouped into blocks (typically 64 to 128 pages per block). There is no seek time or rotational latency. Every part of a SSD can be accessed in the same amount of time, but reads and writes are asymmetric. Reads are performed in the unit of pages with a latency of 50 us. Writes are also performed in the unit of pages, but have a latency of 800 us. Moreover, the data can’t be written unless they are first erased, and an erasure operation must be conducted in the granularity of blocks which incurs a latency of 1500 us.

Below is a general throughput comparison between HDDs and SSDs:

	Sequential read/write	Random read/write (4 KB)
HDDs	160 MB/s	200 IOPS / 0.78 MB/s
SSDs	500 MB/s	100,000 IOPS / 390 MB/s

CPU

Memory hierarchy

Operating system operations

Network

Storage

Contents