PCI: past, present and future

Outline:

 - History
   - Talk about ISA/EISA/VESA/MCA
   - 32-bit, 33MHz, 5V card
   - 64-bit, 66MHz, 3.3V
 - Features
   - Configuration space
   - IO space
   - Memory space
   - Topology
   - Ordering rules
   - Busmastering
 - Linux implementation
   - pci_{read,write}_config_{byte,word,dword}()
   - {in,out}[bwl]()
   - {read,write][bwlq]()
   - pci_dev, pci_bus
 - Related Tech
   - PCI-X
   - AGP
   - CompactPCI
   - MiniPCI
   - Cardbus
 - Present state of PCI
   - Hotplug
   - 133MHz
   - MSI
 - Future
   - 266 / 533MHz
   - PCI-E

Talk:


Introduction
~~~~~~~~~~~~

PCI is probably the most successful bus technology in computing history.
The PCI SIG was founded in June 1992 and currently has over 860 members.
It's available for almost every architecture, from your x86 desktop
to embedded designs (ARM, MIPS, SuperH, V850) to workstations and
supercomputers (Alpha, IA64, PA-RISC, PowerPC, SPARC).  You can even
buy an S/390 on a PCI card.

At the time it was introduced, it was viewed in the PC press as being
a competitor to the VESA Local Bus.  It was certainly similar -- both
were 32-bit, 33MHz bus systems, but due to most Pentium chipsets only
supporting PCI, it won in the PC marketplace.

PCI cards could be automatically configured by the BIOS, unlike most ISA
cards.  They didn't require description files on a floppy, unlike EISA and
MCA cards.  And unlike MCA cards, you didn't need to pay royalties to IBM.

Over time, PCI was expanded to 64-bit and 66MHz.  It has been revised
in evolutionary ways over the past 11 years, clarifying details and
providing support for new features.  It has spawned a host of successor,
competitor and complementary technologies such as CardBus, AGP, MiniPCI,
CompactPCI, PCI-X and PCI-Express.

It has proved to be sufficiently flexible to meet the low-margin needs
of the sound card manufacturers as well as the high-performance needs of
Ultra 320 SCSI cards.

Features
~~~~~~~~

In order to keep the pin count down (which reduces costs), the PCI
bus uses the same pins for both addresses and data.  This makes it
necessary to have a PCI bus protocol which all devices must understand.
In addition to the Address/Data pins, there are a group of pins known as
"Interface Control".  These pins are used to communicate what state the
PCI bus is in.

PCI supports three different address spaces -- Configuration, Memory
and I/O.

	I/O space is also commonly known as port space.  Under Linux,
	you can examine how it is assigned by looking at /proc/ioports.
	It's a 16-bit address space which exists to provide compatibility
	with ISA cards.  Linux provides the functions inb(), inw(),
	inl(), outb(), outw() and outl() for reading and writing byte,
	2-byte and 4-byte quantities.

	Memory space is memory-mapped I/O.  It's up to 64-bits in size
	and offers significant performance benefits over port I/O space.
	To access it in a portable manner, first ioremap() it, then call
	readb(), readw(), readl(), readq(), writeb(), writew(), writel()
	or writeq().

	Configuration space is a mere 256 bytes in size.  It is the
	mechanism for plug-and-play configuration and reports much
	useful information about the device.  Linux provides the
	functions pci_read_config_byte(), pci_read_config_word(),
	pci_read_config_dword(), pci_write_config_byte(),
	pci_write_config_word() and pci_write_config_dword() to access
	this space.  It's not normally necessary to do this as Linux
	caches much of the useful information from this space in the
	pci_dev structure.

Before a PCI device can be accessed in any other way, it must be
configured.  Platform-dependent code tells the Linux PCI code which root
busses exist.  The PCI code scans each bus for devices then configures
the ones it finds.

Each device on a bus is uniquely identified by its device and
function number.  There can be up to 32 devices on each bus, though
physical constraints normally limit the number of devices to around 5.
Each device has a function 0 and may have up to 7 additional functions,
though it is rare to see more than 2.

Each PCI bus has a number, ranging from 0 to 255.  Some of the devices
may be PCI-to-PCI bridges allowing for expansion to many secondary busses.
To uniquely identify a device, you must know the number of the bus it is
on, its device number and its function number.  This is normally written
out (for example by lspci) as 01:09.0.  That represents bus 1, device 9,
function 0.

Even this is not sufficient for some manufacturers so Linux 2.6 supports
PCI domains (aka PCI segments).  This adds yet another layer of hierarchy
so a configuration address is now written in the form 0003:01:09.0 for
domain 3, bus 1, device 9, function 0.

Once a device has been found, it can be configured.  This involves
assigning ranges of port and memory space to it, setting up interrupts,
configuring DMA, error reporting and so on.


Bandwidth
~~~~~~~~~

The first systems had a 33MHz, 32-bit PCI bus.  This has a raw bandwidth
of 132MB/s.  The PCI protocol restricts that somewhat.  Every PCI
transaction starts with an address phase which is then followed by one
or more data phases.  If the device is doing bulk data transfers, the
effective data bandwidth may be over 100MB/s, but if this is mixed in with
a lot of single register accesses, data bandwidth can be as low as 60MB/s.

Several complementary approaches were taken to increase the effective data
bandwidth available.  One was to increase the width of the bus to 64 bits,
doubling the amount of data that could be transferred in each cycle.
PCI 2.1 allowed the bus speed to double to 66MHz.  Combining both
of these approaches was known as PCI 4x.  Each of these approaches has
its downsides.  Doubling the clock speed to 66MHz means that when a
33MHz card is plugged into the bus, every card on the bus has to go at
the slower speed.  Expanding the bus width to 64 bits doesn't give quite
as much bandwidth improvement as 66MHz as a wasted cycle wastes twice
as much bandwidth on a 64-bit 33MHz bus as it would on a 32-bit 66MHz bus.

A more subtle approach was to introduce systems with multiple independent
PCI busses.  Since devices on different busses could not interfere
with each other, transfers would tend to be longer.  Several top-end
manufacturers took this approach to the extreme with each slot being on
its own PCI bus.  This approach also combines well with expanding the
bus width and doubling the clock speed.  It allows the bus to approach
450MB/s of data bandwidth.


Related Technologies
~~~~~~~~~~~~~~~~~~~~

Cardbus is basically PCI in a different format suitable for the low pin
count 32-bit PC Cards.  Cards are PCI devices in all but shape, having
configuration, memory and port space.

MiniPCI is a fairly new standard form factor for laptop expansion.
Many laptops have their modem & internal network card on a MiniPCI
expansion card.

CompactPCI is an industrial form of PCI, similar to the VME bus.
It's managed by PICMG rather than the PCI SIG.  Some of the blade
architectures are based around a CompactPCI backplane, and the standard
is also popular with telecom companies.

PCI-X is currently deployed on the higher-end workstations and servers.
It's an evolution of PCI, refining the bus protocol in some subtle ways
and introducing faster clock rates.  PCI-X 1.0 goes up to 133MHz and
PCI-X 2.0 standardises 266 and 533MHz.  The committee are currently
working on 1066 and 2133MHz variants.

AGP is related to PCI but is not developed by the PCI SIG.  The protocol
and connector are optimised for graphics.  It can only have one card
on the bus, there is no parity checking and there are special AGP
transactions.  For any given nX rating, AGP has double the raw bandwidth
of PCI -- for example, PCI-4X is 528MB/s and AGP-4X is 1GB/s.


Present
~~~~~~~

Most desktop systems have not evolved beyond the original PCI-1x
specification.  Server and workstation chipsets have support for PCI-2x,
PCI-4x and PCI-X 133, but these are a much smaller market.  The reason
for this is simply a matter of demand.  Except for graphics, there are
no devices that demand anything even close to PCI's bandwidth.  Graphics
cards are almost exclusively handled through the AGP bus.

This is quite an astute decision -- if you place a 133MHz PCI-X card and
a 33MHz PCI card on the same bus, the bus is configured to the lowest
common denominator.  To avoid this, machines need multiple busses --
one for high performance cards and another for low performance cards.
But the average desktop has only one high-performance card in it.
By using a different bus, you prevent customers from putting their cards
in the wrong slots and getting abysmal performance.


Future
~~~~~~

The 1GB/s offered by PCI-X 133 isn't enough for the top end cards.  AGP
has already moved beyond it to the 2GB/s AGP-8x.  10Gbps ethernet cards
require 2GB/s of data bandwidth.  Serial ATA and Serial Attached SCSI
will both require upwards of 2GB/s bandwidth per card in the next few years.

PCI-X has faster speeds specified, taking it to 4GB/s and probably beyond,
but PCI-Express aims to be the technology for the future.  Since Intel has
announced plans to kill future AGP development in favour of PCI-Express
and the graphics cards tend to lead the market in terms of bandwidth
consumption, it seems like a pretty safe bet that PCI Express will
become prevalent.

PCI Express was originally called 3GIO.  It is a serial, point-to-point
protocol, not entirely dissimilar to Serial ATA or USB.  The first
generation of products is intended to achieve 16GB/s -- 8 times as
much as AGP-8x can achieve today and 16 times as much as PCI-X 133.
The design is also supposed to be cheap to manufacture, though as the
PCI SIG wryly note on their website, "market forces will ultimately
determine the cost of PCI Express Architecture systems".

>From a software point of view, PCI Express changes very little.
The configuration space is expanded from 256 bytes to 1024.  There will
be more bridges involved.

On a hardware level, the changes are extensive.  A lot of sideband signals
in PCI have been converted into data packets in PCI Express. For example,
interrupts are now sent as data packets rather than being separate lines.
This is not the same thing as MSI -- it cannot carry additional data,
but rather it is a replacement for having an additional set of interrupt
lines per controller.  The PCI Express Root Port is expected to convert
these packets back into standard PCI interrupts.

A new feature in PCI Express is Quality of Service.  The bus is
partitioned into multiple channels and the card can specify whether the
data is low-latency or isosynchronous. (XXX: more here)


Credits
~~~~~~~

My long-suffering wife for proof-reading.

Ottawa Canada Linux User's Group for listening to an earlier version of
this talk and providing feedback.