8086

Compared to 6502, 8086 instruction-set seemed powerful. 8086 had MUL/DIV which 6502 lacked. REP STOS seemed to be a way to fill memory fast.

From 8080, address space of 8086 was increased from 16-bits to 20-bits by notion of memory segmentation. Using segment registers was a little better than bank-switching 64KB.


80286

80286 was a horrible design, border-line insane.

Major feature was "protected mode". Its goal was for hardware to help software be more reliable. A segment register was redefined as an index to a descriptor table. A descriptor defined base address of a 64KB segment within 16MB address space, and also privilege levels and access rights, which means, accessing a memory segment now required several "protection checks". Segment limit checking could prevent out-of-bounds bugs in software, but it proved completely impractical. Problem was segment descriptor table was privileged, that ruined OS/2 1.x, user-space programs had to call a system function to access a 64KB segment. 80286 essentially imposed a slow hardware assert() function that could never be disabled.

80286 was NOT 100%-compatible with 8086. A divide error from integer DIV instruction was originally defined as a trap exception, then 80286 redefined as a fault exception. PCDOS ignored divide errors by installing divide exception handler that just executed IRET to resume program, as IP pointed after DIV instruction. When divide error occurred on 80286, IP pointed at faulting instruction (not after), causing infinite loop.

80286 introduced task-switching. Advertised as built into hardware, actually was microcoded. Task-switching instructions were JMP/CALL/IRET task-gate which were the most complex instructions 80x86 had. Execution time was glacial, hundreds of cycles, slowed by excessive protection checks. (This author microcoded task-switching in an 80x86 imitation which likewise required hundreds of instructions.) Instead, system programmers used regular simple instructions for task-switching.


80386

80386 was a vast improvement. New features were semi-generalizing and extending registers to 32-bits, page mode, virtual 8086 mode.

Most significant improvement in 80386 was "page mode" for virtual memory. 80386 page mode was simple and efficient. 68000 needed MMU coprocessor. PowerPC virtual memory, based on hashing, was so inefficient Apple provided a system call to disable it. ARM virtual memory may be more efficient in some aspects. Page mode is organized as 2 levels, 4KB page directory points to 4KB page tables. Each page table has 4K page table entries (PTEs). 1 page directory entry (PDE) with 1 defined page table (4KB+4KB) maps 4MB of memory. Page mode has hidden costs. To fully map 4GB of memory would require 4MB of page tables. This author, while microcoding page mode for an 80x86 imitation, saw TLB misses occurring very frequently, which stalls CPU and increases memory bus traffic (mitigated now by larger pages).

Microsoft jointly designed 80386 (patents list Microsoft system programmers with Intel engineers). That company tends to ruin anything it meddles with, but in this case, Microsoft's supervision of Intel kept design of 80386 practical. Flaws in IBM/Microsoft OS/2 1.x stemmed from design mistakes in 80286, yet IBM OS/2 2.x proved to be an excellent OS after rewritten for 80386. This author once asked an IBM architect of OS/2 to what degree OS/2 2.x used 80386's protection and privilege-level features, answer was to a large degree.

Intel claimed 80386 had 64 terabytes (2^46) of virtual memory. Unrealistically by swapping 16,384 segments, each 4GB in size, to a spinning disk.

Software was far behind hardware advances in this era. 80386 had been available several years before 32-bit operating-systems became generally available.


80486

80486 was a major leap in performance. 80486 could execute one simple instruction per cycle. CPU, FPU, cache were integrated in one chip. 80486 was the ultimate in-order-execution 80x86.

Whereas difference, at same Mhz, between 80286 vs 80386 was slight, 80486 was several times faster than 80386. This was the era when PCs weren't slow anymore.

Price of 80486 was $~1000 in 1989. Lower-price 80486SX actually had a FPU but was defective, which Intel disabled by cutting a tiny trace with a laser. Some who bought 80486DX2 believed they were getting double-speed CPU, they really got half-speed bus (because mainboard/RAM speed hadn't kept pace).


80586 (Pentium)

Pentium was Intel's first superscalar 80x86 and Intel's last pure CISC design. Pentium had aptly-named 'U' and 'V' pipes. Full 'U' pipe could execute any instruction, limited 'V' pipe only some instructions. Pentium had hard-wired specific rules for pairing execution of two instructions.


80686 (Pentium Pro)

Pentium Pro could execute 3 instructions out-of-order, accomplished by reorder-buffer and register-renaming. Internally, it was a RISC core with an 80x86 front-end decoder. 80x86 instructions were decomposed into RISC "micro-ops". This idea for translating 80x86-to-RISC was pioneered by an 80x86 imitation company named NexGen. Whereas Pentium out-of-order execution was implemented in a specific hard-wired way, Pentium Pro was implemented in a generalized flexible way, by scanning reorder-buffer for micro-ops with independent operands.

80686, introduced in ~1996, was the last fundamental redesign of 80x86 [2021/06].


64-bit 80x86 (x86-64)

x86-64 is a hack design.

x86-64 originated from an 80x86 imitation company and Microsoft. Then, ridiculously, Intel was forced to imitate an imitation.

x86-64 code density was bloated 1, 2, 3 times. To access full 64-bits of a register, needs entire byte (REX prefix). Then it gets worse: VEX/XOP Three-byte Escape Sequence.

Architectural improvements in this hack design are few. Set of GPRs was doubled from 8 to 16 (still not enough). 64-bit mode discarded some legacy features originating from 80286 (segment protection, task-switching).


80x87/FPU

Design of 80x87 FPU is horribly inefficient. This author has heard harsh criticism of 80x87 FPU design from an architect of PowerPC AltiVec. Programming model is based on a stack of 8 registers. IOW, 80x87 was a floating-point "stack machine". Since FP stack was so limited, programmers were forced to continually move FP registers to memory (spilling registers), else stack overflow exception. However, for fairness, Velvel Kahan (a designer of 8087 and IEEE-754) foresaw stack design would be a hindrance, but explained it was necessary at that time, one reason was diminishing opcode space.

Although modern RISC FPUs are faster with many directly-addressable registers, the old 8087 still has one big advantage: greater precision with its 80-bit extended-precision.

Various incremental hacks were attempted: MMX, SSE1, SSE2, SSE3, SSSE3, SSE4, SSE4.1, ad nauseum. Oh, SSE3 wasn't typed twice, that's Supplemental SSE3.


SIMD is nonsense

SIMD composes instructions that will be decomposed -- nonsense. A proven CPU design principle is instructions should be irreducible (simple). Ultimate solution should've been 32 or 64 FP registers for scheduling simple FP instructions out-of-order to many FP units.


80x86 imitations

80x86 imitation companies (excluding Intel here, though who was forced to imitate x86-64, haha) have tried to compete in various niches. But ultimately, when a post-mortem was done on them, cause-of-death was inability to compete across entire spectrum of models (this author heard that while involved with Chromatic Research's technically-successful Tapestry microprocessor project). For a tiny startup company, competing with a low-end CPU was once barely possible, but now low-end 80x86 "CPUs" have evolved into complex systems-on-chip containing almost everything. Competing at high-end requires owning a chip fab only large companies can afford. One such company that once had (past tense) strong potential to compete against Intel, evidently according to US patent #5826084 (smile), was Texas Instruments.


future of 80x86

[2021/06]

This author predicts 80x86 and ARM very probably will be superseded by RISC-V.