I think everybody is forgetting the most important thing in Zen 2 EPYC CPUs: I/O Die, and its role.
It may be more than just Infinity Fabric links. It may actually be "master" controller for "slave" Chiplet CPUs.
There is little indication that the I/O Die executes anything at the application level. There is a good chance there might be a secure enclave ARM processor there with some boot duties so it is necessary for the chiplet cores to "start" but post boot operations there probably isn't much there. It is a "master" in the very narrow sense that if you completely starve a CPU core of all I/O data it can't do much of anything. Those chiplets can't get to any persistent storage (no firmware, no disk , no storage network data, etc.) without the i/O chip.... but a "master' in a computational dimension that is more than a bit lacking.
There may be some coordinating sleep/wake power management control ( depends upon how fine grained control is given to the OS ). There probably should be some coordinated control over which chiplets are put to sleep in corner case modes where the computational workload dries up. However, in back in the full computational dimension ... "master"/"salve" is the wrong notion.
There is pretty high chance they have stuffed the PCH in the I/O chip. there is some internal computation for USB/Ethernet etc., but not at the app level.
If AMD wanted to be really clever there could be some memory compression/decompression there in future version, but that isn't master slave either.
The cores in the chiplets need the functionality in the I/O chip, but that isn't master/slave relationship.
[doublepost=1544896829][/doublepost]
not exactly "on die" if it is on a different die in the same package. More like "on chip" ( chip == package . The physical container the die(s) are in. )
And probably a hefty chunk of the southbridge as were the previous Zen implementations.
Probably closer to merging the old classic north and south bridge chips onto a single die and then mounting that die inside the CPU package. Which is why it is on the slippery slope to going back to "front side bus" problems. Can get to context where have way too high of a ratio of cores to memory channel / controller paths and and to many memory requests stampeding too few "doors" to the RAM chips.
The L1/L2/L3 caches are now substantially larger and the branch predictors are much better and have multithreading (SMT / "hyperthreads" ) so can hide the log jams a bit better, but core counts going way higher run will evenutally bring back the problems (unless constrain the workload into being chopped up into fine grain pieces. )