|
80x86 CPUs index
|
8086
Compared to 6502, 8086 instruction-set seemed powerful.
8086 had From 8080, address space of 8086 was increased from 16-bits to 20-bits by notion of memory segmentation. Using segment registers was a little better than bank-switching 64KB. |
8028680286 was a horrible design, border-line insane.
Major feature was "protected mode".
Its goal was for hardware to help software be more reliable.
A segment register was redefined as an index to a descriptor table.
A descriptor defined base address of a 64KB segment within 16MB address space,
and also privilege levels and access rights,
which means, accessing a memory segment now required several "protection checks".
Segment limit checking could prevent out-of-bounds bugs in software, but it proved completely impractical.
Problem was segment descriptor table was privileged, that ruined OS/2 1.x,
user-space programs had to call a system function to access a 64KB segment.
80286 essentially imposed a slow hardware
80286 was NOT 100%-compatible with 8086.
A divide error from integer
80286 introduced task-switching.
Advertised as built into hardware, actually was microcoded.
Task-switching instructions were |
8038680386 was a vast improvement. New features were semi-generalizing and extending registers to 32-bits, page mode, virtual 8086 mode. Most significant improvement in 80386 was "page mode" for virtual memory. 80386 page mode was simple and efficient. 68000 needed MMU coprocessor. PowerPC virtual memory, based on hashing, was so inefficient Apple provided a system call to disable it. ARM virtual memory may be more efficient in some aspects. Page mode is organized as 2 levels, 4KB page directory points to 4KB page tables. Each page table has 4K page table entries (PTEs). 1 page directory entry (PDE) with 1 defined page table (4KB+4KB) maps 4MB of memory. Page mode has hidden costs. To fully map 4GB of memory would require 4MB of page tables. This author, while microcoding page mode for an 80x86 imitation, saw TLB misses occurring very frequently, which stalls CPU and increases memory bus traffic (mitigated now by larger pages). Microsoft jointly designed 80386 (patents list Microsoft system programmers with Intel engineers). That company tends to ruin anything it meddles with, but in this case, Microsoft's supervision of Intel kept design of 80386 practical. Flaws in IBM/Microsoft OS/2 1.x stemmed from design mistakes in 80286, yet IBM OS/2 2.x proved to be an excellent OS after rewritten for 80386. This author once asked an IBM architect of OS/2 to what degree OS/2 2.x used 80386's protection and privilege-level features, answer was to a large degree. Intel claimed 80386 had 64 terabytes (2^46) of virtual memory. Unrealistically by swapping 16,384 segments, each 4GB in size, to a spinning disk. Software was far behind hardware advances in this era. 80386 had been available several years before 32-bit operating-systems became generally available. |
8048680486 was a major leap in performance. 80486 could execute one simple instruction per cycle. CPU, FPU, cache were integrated in one chip. 80486 was the ultimate in-order-execution 80x86. Whereas difference, at same Mhz, between 80286 vs 80386 was slight, 80486 was several times faster than 80386. This was the era when PCs weren't slow anymore. Price of 80486 was $~1000 in 1989. Lower-price 80486SX actually had a FPU but was defective, which Intel disabled by cutting a tiny trace with a laser. Some who bought 80486DX2 believed they were getting double-speed CPU, they really got half-speed bus (because mainboard/RAM speed hadn't kept pace). |
80586 (Pentium)Pentium was Intel's first superscalar 80x86 and Intel's last pure CISC design. Pentium had aptly-named 'U' and 'V' pipes. Full 'U' pipe could execute any instruction, limited 'V' pipe only some instructions. Pentium had hard-wired specific rules for pairing execution of two instructions. |
80686 (Pentium Pro)Pentium Pro could execute 3 instructions out-of-order, accomplished by reorder-buffer and register-renaming. Internally, it was a RISC core with an 80x86 front-end decoder. 80x86 instructions were decomposed into RISC "micro-ops". This idea for translating 80x86-to-RISC was pioneered by an 80x86 imitation company named NexGen. Whereas Pentium out-of-order execution was implemented in a specific hard-wired way, Pentium Pro was implemented in a generalized flexible way, by scanning reorder-buffer for micro-ops with independent operands. 80686, introduced in ~1996, was the last fundamental redesign of 80x86 [2021/06]. |
64-bit 80x86 (x86-64)x86-64 is a hack design. x86-64 originated from an 80x86 imitation company and Microsoft. Then, ridiculously, Intel was forced to imitate an imitation. x86-64 code density was bloated 1, 2, 3 times. To access full 64-bits of a register, needs entire byte (REX prefix). Then it gets worse: VEX/XOP Three-byte Escape Sequence. Architectural improvements in this hack design are few. Set of GPRs was doubled from 8 to 16 (still not enough). 64-bit mode discarded some legacy features originating from 80286 (segment protection, task-switching). |
80x87/FPUDesign of 80x87 FPU is horribly inefficient. This author has heard harsh criticism of 80x87 FPU design from an architect of PowerPC AltiVec. Programming model is based on a stack of 8 registers. IOW, 80x87 was a floating-point "stack machine". Since FP stack was so limited, programmers were forced to continually move FP registers to memory (spilling registers), else stack overflow exception. However, for fairness, Velvel Kahan (a designer of 8087 and IEEE-754) foresaw stack design would be a hindrance, but explained it was necessary at that time, one reason was diminishing opcode space. Although modern RISC FPUs are faster with many directly-addressable registers, the old 8087 still has one big advantage: greater precision with its 80-bit extended-precision. Various incremental hacks were attempted: MMX, SSE1, SSE2, SSE3, SSSE3, SSE4, SSE4.1, ad nauseum. Oh, SSE3 wasn't typed twice, that's Supplemental SSE3. |
SIMD is nonsenseSIMD composes instructions that will be decomposed -- nonsense. A proven CPU design principle is instructions should be irreducible (simple). Ultimate solution should've been 32 or 64 FP registers for scheduling simple FP instructions out-of-order to many FP units. |
80x86 imitations80x86 imitation companies (excluding Intel here, though who was forced to imitate x86-64, haha) have tried to compete in various niches. But ultimately, when a post-mortem was done on them, cause-of-death was inability to compete across entire spectrum of models (this author heard that while involved with Chromatic Research's technically-successful Tapestry microprocessor project). For a tiny startup company, competing with a low-end CPU was once barely possible, but now low-end 80x86 "CPUs" have evolved into complex systems-on-chip containing almost everything. Competing at high-end requires owning a chip fab only large companies can afford. One such company that once had (past tense) strong potential to compete against Intel, evidently according to US patent #5826084 (), was Texas Instruments. |