80x86 microprocessors

Intel 80386DX microprocessor

8086

Compared to 6502, 8086 instruction set seemed powerful. 8086 had MUL/DIV which 6502 lacked. 8086 ADD [MEM],REG could do in one instruction what would require multiple LD/ADD/ST 6502 instructions. REP STOS seemed to be a way to fill memory fast

From 8080, address space of 8086 was increased from 16-bits to 20-bits. Designers made a bad decision, instead of extending registers to 32-bits, Intel chose to add segment registers, which was little better than bank-switching 64KB.

80286

80286 was a horrible design, border-line insane.

Major feature was "protected mode". Its goal was for hardware to help software be more reliable. A segment register was redefined as an index to a descriptor table. A descriptor defined base address of a 64KB segment within 16MB address space, and also privilege levels and access rights, which means, accessing a memory segment now required several "protection checks". Segment limit checking could prevent out-of-bounds bugs in software, but it proved completely impractical. Problem was segment descriptor table was privileged, that ruined OS/2 1.x, user-space programs had to call a system function to access a 64KB segment. 80286 essentially imposed a slow hardware assert() function that could never be disabled.

80286 was NOT 100%-compatible with 8086. A divide error from integer DIV instruction was originally defined as a trap exception, then redefined as a fault exception on 80286. DOS ignored divide errors, DOS installed IRET in divide exception handler, resuming program, as IP pointed after DIV instruction. When divide error occurred on 80286, IP pointed at faulting instruction (not after), resulting in an infinite loop, required pressing RESET button.

80286 introduced task-switching supposedly in hardware. Instructions were JMP/CALL/IRET task-gate. Problem was these instructions were terribly slow, executed in hundreds of cycles. This author microcoded these instructions in an 80x86 imitation which likewise required hundreds of micro-instructions. Slow execution combined with instruction documentation being complex and nearly incomprehensible, system programmers chose to code task-switching with faster simple instructions.

80386

80386 was the final major evolution of the programmer-visible architecture (ignoring extensions). 80386 was a vast improvement. New features were extending registers to 32-bits, page mode, virtual 8086 mode.

Most significant improvement in 80386 was "page mode" for virtual memory. 80386 page mode is simple, efficient; better than most other virtual memory designs. ARM virtual memory may be more efficient in some aspects. 68000, although superior in every aspect other than virtual memory, required an external MMU chip. PowerPC virtual memory, based on hashing, was so inefficient Apple provided a system call to disable it. Page mode is organized as 2 levels, a 4KB page directory points to 4KB page tables. Each page table has 4K page table entries (PTEs). 1 page directory entry (PDE) with 1 defined page table (4KB+4KB) maps 4MB of memory. Page mode has hidden costs. To fully map 4GB of memory would require 4MB of page tables. This author, while microcoding page mode for an 80x86 imitation, saw TLB misses occurring very frequently, which stalls CPU and increases memory bus traffic (mitigated now by large 4MB pages).

Microsoft jointly designed 80386 (patents list Microsoft system programmers with Intel engineers). Anything designed by that company will usually explode, but in this case, Microsoft's involvement kept design of 80386 practical. Flaws in IBM/Microsoft OS/2 1.x stemmed from design mistakes in 80286, yet IBM OS/2 2.x proved to be an excellent OS after rewritten for 80386. This author once asked an IBM architect of OS/2 to what degree OS/2 2.x used 80386's protection and privilege-level features, answer was to a large degree.

80486

80486 was a major leap in performance. 80486 could execute one simple instruction per cycle. CPU, FPU, cache were integrated in one chip. Whereas difference, at same Mhz, between 80286 vs 80386 was slight, 80486 was several times faster than 80386. For 99% of users, this was the beginning when PCs weren't slow anymore.

Lower-cost 80486SX was actually an 80486 with a defective FPU, which Intel disabled by cutting a trace with a laser. Some who bought 80486DX2 believed they were getting double-speed, what they really got was a CPU that ran its bus half-speed (mainboard/RAM speed hadn't kept pace, IIRC).

80586 (Pentium)

Pentium was the first superscalar 80x86. Pentium had aptly-named 'U' and 'V' pipes. Full 'U' pipe could execute any instruction, limited 'V' pipe only some instructions. Pentium had hard-wired specific rules for pairing execution of two instructions.

80686 (Pentium Pro)

80686 (Pentium Pro) was the final major evolution of the internal/hidden micro-architecture. Pentium Pro could execute 3 instructions out-of-order, accomplished by reorder-buffer and register-renaming. Internally, it was a RISC core with an 80x86 front-end decoder. 80x86 instructions were decomposed into RISC "micro-ops". Whereas Pentium's ability to execute two instructions was implemented in a specific hard-wired way, Pentium Pro's out-of-order execution was implemented in a flexible way, by scanning reorder-buffer for micro-ops with independent operands.

64-bit 80x86

Ridiculously, Intel once lost control of its own 80x86 design, then Intel was forced to imitate an imitation. 64-bit extensions from an 80x86 imitation company were jointly designed and approved by Microsoft. Number of registers was doubled to 16, native 64-bit mode discarded most of unused legacy features (segment protection, virtual 8086 mode, task-switching). RISC designs, ideal at 32-bits, become bloated when extended. 64-bit 80x86 code appears to avoid bloat by IMM8/16/32 operands and IP-relative addressing.

x87/FPU

Design of x87 FPU is horribly inefficient. This author has heard harsh criticism of x87 FPU design from an architect of PowerPC AltiVec. Programming model is based on a stack of 8 registers (x87 was a floating-point "stack machine"). Since FP stack was so limited, programmer was forced to continually move FP registers to memory, else risk a stack overflow exception. This was known as spilling registers. Example of this inefficiency is to multiply a matrix in a 3D program will requires much more than 8 FP registers.

Various hack extensions such as MMX and SSE{1,2,3,4...inf} were tried, but ultimate solution would've been 32 or 64 FP registers with a wide out-of-order RISC-style FPU, omitting vector instructions.

A design principle is software should only specify instructions that are irreducible (simple). Problem with vector or "SIMD" instructions is those are essentially yet more CISC instructions. Combining instructions, which then must be decombined, so RISC core can execute them, is like pressing accelerator and brake pedals.


80x86 imitations

80x86 imitation companies have tried to compete in niches, such as previous-gen performance or low-power, but ultimately, when a post-mortem was done on them, cause-of-death was inability to compete across entire spectrum of models (this author heard this directly while at an 80x86 imitation company). Competing at low-end was once possible, but is now far too difficult for any startup company. Low-end 80x86 "CPUs" have evolved into complex systems-on-chip containing almost everything: cache, graphics, audio, USB, SATA, etc etc, even obsolete UART. Competing at high-end requires designing a microprocessor in conjunction with building a chip fab that can produce the smallest transistors.

Occasionally, a few imitation companies have had temporary success, such as low-price 80386 and 80486 imitations. AMD64X2 was only imitation that ever achieved a major leap past Intel (which this author has direct knowledge). This author compared 2.2Ghz AMD64X2 vs ~3GHz Pentium 4. AMD64X2 ran faster, cooler, quieter -- Pentium 4 ran hot like a heater. Consequences were typical PCs began running cooler, Intel reorganized to redevelop a notebook design as their "core" design.


future of 80x86

Like writing a book, begin at the end -- future of 80x86 is obsolescence.

What will cause obsolescence of 80x86?
Speed? Memory limitation? No.

Duration.

A notebook that runs for 24 hours sounds nice today. But how about a notebook that, used daily, runs 3 months until a recharge becomes necessary?

80x86 microprocessors can never be as power-efficient as a microprocessor designed specifically.

This author's prediction is 80x86 will be obsoleted by an extremely power-efficient microprocessor that emulates 80x86 instructions.