In terms of time and effort for designing, I would say that one of the biggest x86 hurdles is what we call “verification.” There is an entire team of people responsible for making sure that the design works properly with a wide range of x86 software, by running thousands and thousands of instruction traces through the design and making sure the results are right. Entire banks of machines run around-the-clock making sure that a huge library of traces built up over many years, designed to stress the most tricky corner cases, work properly. As far as I know, the only two companies ever to successfully accomplish this are AMD and Intel. Even back when there were only about 20 chip designers, there were probably at least a half dozen verification engineers. Whereas, on at least one RISC chip I worked on, there were only 2 verification engineers, and we didn’t need to use the entire set of engineering desktop machines to run traces in the background around-the-clock. Also, since it was a startup, it obviously didn’t take them years to develop the set of traces.cmaier, I really appreciate you sharing your from-the-trenches experience. Thanks.
I would like to ask you something that is really difficult to evaluate from my armchair perspective - I have often heard that the sheer volume of the x86 ISA, the "accumulated cruft", would make designing new x86 cores require more work/time/expense/debugging than designing, say a pure 64-bit ARM8 core.
It sounds plausible, but - by how much? Enough that it significantly affects decision to product cycle time, or can it be compensated by hiring more people? Does it have any specific consequences you’d like to mention, (apart from the more formal consequences of dependencies you’ve already discussed)?
x86 is particularly tricky because, at least for 32-bit code, you can do things like programmatically modify the instruction stream. Lots of weird things to test.
(for this discussion i am leaving out the issue of whether there are fundamental technical constraints that mean you can never have an x86 chip as good as the best possible RISC chip, and focusing just on the design effort)
From the point of view of design, the design of some blocks is about the same complexity - an ALU is an ALU. Some ALUs have to deal with things like square root and others don’t, but that’s the case both for x86 and non-x86. Other blocks, like the instruction decoder, are much more complex in x86, but that can be compensated for by having more designers. At the chip level, x86 will impose tougher constraints between blocks - lots of extra control signals, tags, etc. that have to be sent around the chip and make it from place to place in time. This can cause timing issues that result in a slower chip. But every chip has its own quirks that can do the same thing. Based on personal experience, I feel like it is harder to solve this on x86, but your mileage may vary. (My first experience with x86 was trying to speed up an existing design by squashing some of those paths. It took me 6 months, but I got it to the point where our next chip could be 20% faster. At the time I was cursing x86 a lot, because when I had to do the same thing on a PowerPC chip it was a heck of a lot easier. But some of that was likely just the design style of the blocks I inherited.)