This is a very naïve post.
Xeons (and Core) do not run the x86 (or even x64) instruction set. Why don't people understand that? x64 is the "bytecode" for the RISC engine architecture that the chip executes.
And as far as SMT depth goes, the problem is not the instruction set. The problem is available execution units. If you have a load/store unit, an integer/address unit, and an FP unit - you'll get some benefit from SMT. If you have two of each, you'll get more benefit.
8-way SMT is almost ludicrous - instead of putting 8 execution units of each type in each core, use the transistors for four cores with two units per.
NOH8 on x64.
BTW - there's one important use case where Intel hyperthreading gives 2X performance, know what it is?
Aiden, I know about Transcoding is right now the mainstream best use case for hyperthreading, but SMT Is far beyond hyperthreading, it's purpose is to run a 2nd thread while 1sr thread still waits for i/o or other resources (as fp integer etc), even to share resources on other thread requiring more execution pipelines to solve an instruction.
Ia64 theoretically allowed to execute "efficiently" more than 2 SMT but compiler issues never allowed more than 1 extra SMT.
Trade SMT for extra cores it's logical unless you reached the maximum theoretical efficiency and you need to squeeze more juice from the silicon, but even an advanced post x86 architecture foresee some dynamic SMT <=> out of order, however going beyond 2x (which basically splits fp and integer execution queue) you need to switch back to wisc instead risc (as you cited modern x86-64 cpu translates x86 instructions to its RISC equivalent code) but following Itanium concept at least instead to translate each instruction into n RISC (a key factor on ARM efficiency is it don't need to translate CISC to RISC), as I cited it requires each instruction it's very specialized execution pipeline and only works together an efficient compiler.
WISC offers the only theoretical possibility to go beyond in IPC, it's very like to axe all old x86 instructions and work only on a more general purpose AVX like instruction set (at least seems Intel understand better the concept) but as with AVX it's nothing easy to deploy on the field, a lot of work has to be done on compilers.
An example one of the concepts on a WISC multi core cpu with widely shared execution pipelines where some task in a thread could use one or more unused fpu or integer units unused from other cores (as when calculating with very large integers something required by criptocurrency, you'll like to use more integer units ), of course this requires even more special instructions set and very optimized compilers where itanium failed miserably, there is the challenge.
As I see the future in long term for the last Von Neumann computer generation will lie on advanced WISC or WISC-like cpu.