The Future of AI in Computing: Moving Beyond CISC towards ARM-based Architecture

As the landscape of computing evolves, both Intel and other tech giants are exploring new architectures to better serve the demands of Artificial Intelligence (AI). This includes the move towards ARM architecture from traditional x86 CISC (Complex Instruction Set Computer) cores. Let's delve into why this shift is critical and how it impacts various computing systems.

Understanding CISC vs. RISC: A Simplified Overview

First, it's important to clarify the difference between CISC and RISC architectures. CISC, like Intel's 8086, has many complex instructions that can handle a wide variety of tasks, including those that involve memory manipulation directly within single instructions. On the other hand, RISC (Reduced Instruction Set Computer), typified by ARM architecture, uses simpler, more specialized instructions, typically just load/store operations for memory manipulation.

Theoretically, the overhead of complex CISC instructions is larger, and thus CISC code tends to be more compact. Single CISC instructions can handle the same tasks that would require multiple RISC instructions, making the code smaller. However, this compactness comes at a cost in terms of performance, as RISC architectures can execute multiple instructions simultaneously thanks to their superscalar design.

Focusing on Microarchitecture: Beyond Instructions

When it comes to modern high-performance cores, the difference between CISC and RISC is minimal. All modern cores, be they x86 or ARM, are RISC-like in their microarchitecture. They use operations similar to those in micro/macro operations (MOPS). The performance of a core is defined by how "wide" it is, meaning how many execution units (EU) it has, the size of the reorder buffer (ROB), and various cache sizes. The focus is no longer on the complexity of CPU instructions, but rather on the efficiency and parallelism within the microarchitecture.

CISC Architecture: Intel's Superiority in Microarchitecture Features

While ARM is focused on simplicity and compact code, x86/x64 architectures offer more advanced microarchitecture features. For example, Intel has supported 256-bit SMT (Simultaneous Multithreading) for over 20 years, while AMD's Ryzen recently introduced 512-bit support. ARM currently only supports 128 bits. This means that x86 cores can perform multiple complex operations in parallel, making them more powerful for general computing tasks.

The Role of AI in Future Computing

AIs need to perform millions of simple operations to calculate probabilities using basic math. The data type used in these calculations is often FP8 or even INT8, which are 8-bit data types. In contrast, FP32 is the most commonly used 32-bit floating-point data type. AI workloads thrive on parallelism and efficient execution of basic math operations.

Traditionally, GPUs and NPUs (Neural Processing Units) have been the go-to solutions for AI workloads due to their ability to execute hundreds or even thousands of operations in parallel. For instance, the AMD MI30 has 19,456 stream processors capable of delivering 5.22 PFLOPs (PetaFLOPs). Interestingly, the MI30 internally houses 24 Zen4 cores to provide data to its CDNA GPU cores for processing. Similarly, the NVidia H100 boasts 14,592 CUDA cores, also powered by modified ARMv9 cores originally used in mobile phones.

Power Consumption and Efficiency

When it comes to power efficiency, supercomputers like Japan's Fugaku, built with 7,630,848 ARM cores, consumed significantly more power compared to supercomputers like Frontier, which uses AMD Epyc and MI25 chips that are 3 times faster and consume only 560W. The Fugaku consumed 21,268,608W, highlighting the inefficiency of ARM-based supercomputers in terms of power consumption.

Apple and Qualcomm's Leadership in AI SoCs

Meanwhile, Apple and Qualcomm have taken a lead in integrating AI-oriented Neural Signal Processors (NSP) cores into their System on Chips (SoCs). These cores are specifically designed for AI tasks and are far more powerful than what Intel and AMD currently offer in their CPU cores. This reflects a strategic move towards a more AI-focused future, as these companies aim to capture the growing market for AI-powered devices.

In conclusion, while ARM architecture offers advantages in terms of compact code and simplicity, transitioning to such a system in large-scale data centers and supercomputers requires a shift in how these systems are designed and optimized. Intel, with its advanced microarchitecture features, appears to have an edge in areas where complex, parallel operations are required. As AI continues to evolve, the choice between CISC and RISC architectures will be crucial, and the future may well lie in architectures that offer the best balance of performance and efficiency.