Pipeline
In order to sidestep this relentless push toward smaller nodes and more power hungry processors, some application code can be rewritten to allow for parallel processing, where more than one section of code can be executed at the same time, increasing overall processing speed. That said, not all code can be rewritten for parallel processing. Tasks that can be broken down into smaller ‘sub-problems” are ideal for parallel processing, but those involving data dependency or those that have sequential bottlenecks (a point where execution stops until a single task in the code stream is completed) are very difficult to rewrite, so the push for faster and more powerful processors continues.
Intel (INTC) has come up with a process that uses software to create a ‘super core’, essentially two cores operating as one. With the addition of a small bit of hardware, this allows a single sequential program to operate as if it were running in parallel, using two cores as if it were one much larger core, without rewriting the code. The Intel process (software and hardware) allows two cores to operate as a single ‘virtual’ core, accomplished with special synchronization circuitry that allows the cores to ‘speak’ to each other using a special section of high-speed memory that is cordoned off from other functions, but the real magic is in the software stack.
.The source code (the program) is compiled into a single threaded program of binary code by a generic C++ or similar compiler, but instead of running the code at this point, the Intel software runs a JIT (Just-in-time) compiler that analyzes the code and identifies frequently used sections. It converts them into a format that can be split and run concurrently, along with additional ‘flow control instructions. As the two cores execute the modified code, the synchronization circuitry follows the flow control instructions, allowing the two cores to work as one. They fetch and executing the program segments concurrently.
But wait, there’s more…Each core has a BPU (Branch Prediction Unit) to keep track of where potential splits in the program’s instructions occur. This helps the cores make guesses about where to find the next instruction block to keep both cores active as much as possible and to keep them from both acting on the same data at the same time. This is important as an incorrect guess means the entire pipeline of instructions must be ‘cleaned’ (emptied) and restarted from the miscalculation point, which is a big performance penalty. Newer Intel processors have two (Hybrid) instruction prediction systems to reduce such issues.
So, instead of looking toward a smaller CPU node, larger number of cores, or higher power cores, Intel has made some modifications to the cores and the execution stack that make two cores act as one larger and more efficient core. This allows the SDC to boost single-threaded performance without the inefficiency of building larger, more power-hungry cores. Since Intel has E-cores (efficiency) that are already small and power efficient, combining them creates a more powerful “supercore”, that rivels Intel’s P-core (performance) but with greater power efficiency.
Of course, much of this information comes from Intel patents, so there are no yardstick for when or if this might find its way into consumer products, but it represents a way for Intel to simplify core architecture from two types (E & P) to a single high performance core type that could replace the need for two and simplify chip design. Of course, there is no free lunch and we would be remiss if we did not also point to some of the issues that make this seemingly simple process more complex.
In this case the bigger problem is not hardware, it’s software, as both the JIT compiler and the operating system scheduler must be intelligent enough to not only identify code sections for SDC execution but also to manage the complex inter-core synchronization with minimal overhead and latency. This would require considerable cooperation between Intel and operating system developers (Microsoft (MSFT), Linux, etc.) and a deep understanding of the processor’s architecture by the OS. The latency (communication between the cores) is particularly critical as if it is greater than the performance gain of running them in parallel, the system would fail to provide any benefit (similar to the Intel Itanium processor). If Intel is able to work through those difficulties, the benefits of a single core architecture would be substantial and reduce the contant push for smaller CPU production nodes and higher power CPUs.
RSS Feed