Since Carrizo and then Bristol Ridge, AMD has put much effort in developing clever power management features in order to remedy the lack of a FinFET process node. Here we describe them and show how these can increase also stability and max clock. Finally we try to infer XFR features and behaviour from these early features.
1) Adaptive Clocking System, first introduced in Carrizo
In a paper presented at ISSCC2014 (this: http://ieeexplore.ieee.org/document/6757358/) there is described this feature that, in short, stretches the clock signal, and thus slows down the CPU, as soon as it detects a voltage droop on the Vcore provided to the CPU. This is useful because allows to lower the usual 10-15% margin on the Vcore requested by the CPU to the motherboard's VRMs. The paper describes experiment with the thresholds of the algorithm, finding the most useful: as soon as the Vcore droops more than 2.5% than the nominal, the clock is slowed down (stretched) by 7%. This in turn allows to decrease, at same frequency, the nominal Vcore by up to 6%, with power savings up to 15%, with maximum performance drop of 1%. But this power saving, in turn, can be used to raise the clocks. In this paper was also introduced a circuit, called Power Supply Monitor (PSM), that is used to measure the actual Vcore and will be used also in the other technologies described here.
2) AVFS, first introduced in Carrizo
In various paper describing the Carrizo architecture, AVFS was described and characterized. It allows to save 7-20% of power at same frequency, but this power budget can also be used to increase frequency. It works constructing a Voltage Frequency Table (VFT) for each core (10x8 elements by 10 bits on Carrizo and Bristol Ridge) containing the minimum voltage required for stable operation of a core, according to a set of temperature. This, in turn, can be used to choose the correct Vcore for each core, given the current required p-state's frequency. The calculations are made on data collected by 10 units spread on the chip, each of which has 50 replica circuits, that are representative of critical paths of the CPU, for a total of 500 circuit replicas (on Zen they are 1300, but there is no info on unit numbers). These units employ a Critical Path Accumulator (CPA) algorithm, that exercises the replicas with a test signals and measures the actual delays distributions. Then a statistic is constructed, also taking into account actual Vcore (measured by the PSM). On this statistic some guard band are added and then the VFT table is constructed. This calculation is performed periodically by the System Management Unit (SMU) and can be also triggered by microcode. Since the table is relatively big and includes multiple temperatures, it must not be calculated often.
3) BTC, first introduced in Carrizo
In various paper describing the Carrizo architecture, BTC was described and characterized. It allows to lower the Vcore margins needed to compensate for motherboard, VRMs and CPU aging and also for VRMs tolerances, saving power. Again this power budget can be used to increase frequency. The PSM measures the actual Vcore provided by VRMs, so at boot the Vcore is measured and compared with the expected Vcore. Then this measure is repeated also with a fictious load on the CPU, to acquire the DC offset and AC parameters. During normal operation, the Vid provided to the VRMs is modified to take into account DC offset, current load on the CPU, temperature and per-part leakage. This allows to correct the VRMs for the various effects and provide the correct Vcore for the CPU. This allows to lower the Vcore margins that usually are put on the Vcore for these effects. On a final note, this feature is present also on Fiji GPU.
4) Reliability tracker, first introduced on Bristol Ridge
This technology allows to dynamically calculate the maximum voltage can be applied to the CPU, to allow the CPU to endure for the intended time period. Usually this Voltage is fixed and calculated assuming that the CPU is fully loaded at 100% and run at highest temperature (e.g. 100C°). But this is almost not true and, with a formula, the Vmax is increased up to 16%, depending on the CPU usage and actual temperature. This technology is useful in Bristol Ridge (and probably in Zen) to limit the Vcore to safe levels, while increasing the frequency when there is power and temperature room.
5) Digital LDO, first introduced on Bristol Ridge
The P-MOS used for power-gating a core (CC6 state) are repurposed, with little overhead, to have a digital VRM around each CPU core. This allows to have one power rail for the CPU, but separate Vcore for each core. This allows to use the correct Vcore for each core (calculated by AVFS, that have separate VFT), saving power. This digital VRM allows also to smooth the Vdroops, limiting the intervention of Adaptive Clocking System to the minimum. Another feature of this system is the use of very low Vcore for low loaded core: these cores can go in a very low power state, that saves power that can be used to boost more other cores. Moreover an idle core can go in the lowest possible state, still retaining L1 and L2 caches, but drawing only little more than a true CC6 power gated state, that is slower to enter and exit and requires to flush the caches and re-fill on wake-up.
6) Shadow P-states, first introduced on Bristol Ridge
By combining information from BTC, reliability tracker and AVFS, the SMU actually calculate the highest feasible p-state in the current conditions, giving an average boost of 100MHz. This means that Bristol Ridge, if the conditions are favorable, can maintain a higher boost state that would be maintained under the same load by a previous architecture.
7) STAPM, first introduced on Mullins, available on Bristol Ridge too
This technology models the actual cooling system and calculates the skin temperature of a portable device, from the actual power drawn, filtered to smooth peaks and so allowing them for a short periods. This allows to use a higher power in temperature constrained environments.
Now let's talk of Ryzen. We can suppose that it has all these features. XFR is not too much far from Bristol Ridge's shadow p-states. The only difference is that XFR should go in 25MHz increments, and not be bound to p-states and thus it doesn't have any upper limits. The SMU calculates the VFTs and if the current power drawn is under the TDP limit, the temperature is under the limit and the needed Vcore is under the current calculated Vmax, the frequency increase is acknowledged. If any of these conditions are not met, the frequency is decreased until they are met. The temperature can probably be measured in multiple spots and the maximum is used. This can limit the single core boost to safe levels, in which the TDP limit is not met, but the core is too hot. Better cooler will help to raise the actual frequency at which this happens.
From these information, we could guess that XFR is a powerful automatic clocking technology, that will render hard to do better with traditional overclock. Just raise TDP and temperature limits, put a good cooler and let Ryzen clock itself higher. The only other knobs, beside TDP and temperature limits, that we foresee are Vmax offset, to allow higher Vcore and guard band relaxing on the statistics collected to calculate the VFTs. But those knobs are at your own risk. We don't know what things will be configurable by the users, but XFR seems an interesting tool to squeeze the maximum from the hardware.