Parallel single threading seems entirely plausible phase the clock skew peaks and dips on two chips and synchronize oscillation switching between one and the other. You should get 100% increase in performance with two chips like that in theory, but clock skew frequency oscillation is always in constant motion so you move from peaks to dips so with the switching in mind to maximize both you end up 50% in the best case scenario though synchronizing and sequencing it might not be 100% perfect so could be closer to 48%. I don't know if they can execute it perfectly in practice, but in theory it's defiantly within the scope of possibilities. You can actually mimic that with a pair of music sequencers it's functionally possible.
I mentioned the concept of it in the Intel bigLITTLE TPU thread not that far back you can basically manipulate clock skew or cycle duties in a clever manner in theory to get more performance by manipulating it in a similar fashion to what was done with by MOS Technology with the SID chip for the arpeggio's to simulate playing chords with polyphony it was a clever hardware trick at the time. It seems far fetched and somewhat unimaginable to actually be applied, but innovation always is you have to think outside the box or you'll always been stuck in a box.
This is a quadruple LFO what is allegedly being done is twin LFO if you look at the intersection points that's half a cycle duty rising and falling voltages/frequencies. If you look at the blue and green or yellow and purple they intersect perfectly. What's being done is a switching at the intersection cross section so you've got two valley peaks closer together and the base of the mountain so to speak isn't as far downward. That's assuming this is in fact being done and put into practice by AMD. I see it within the oscilloscope of possibilities for certain. That's basically what DDR memory did in practice. Big question is if they can pull it off within the dynamic complexity of software. Then again why can't they!!? Can't see what they can't divert it like a rail road track at that crossroad intersection point. That nets you a roughly 50% performance gain with 4 chips the valley dips would be reduce more and the peaks would happen more routinely and you'd end up with 100% more performance I think that's what DDR5 is suppose to do actually on the data rate hence the phrase quad data rate.
Thinking about it further I really don't see a problem with the I/O die managing that type of load switching in real time quickly and the data would already be present in the CPU memory it's not like it gets flushed instantly. Yeah maybe it could become a bit of a materialized reality. If not now certainly later. I have to think AMD will incorporate a I/O for the GPU soon as well if they want to pursue multi-chip GPU's.