NVIDIA Brings the CPU Back to Center Stage — While Pushing Intel to the Margins

NVIDIA yesterday unveiled, for the first time, the full architecture of its Vera Rubin platform at GTC — the company’s next-generation AI infrastructure designed for the era of agentic AI. Unlike previous generations built around discrete servers, Rubin is presented as a complete rack-scale system, where multiple specialized racks operate together as what NVIDIA describes as an “AI factory.”

The architecture consists of several types of racks, each responsible for a different layer of the system. GPU racks, powered by Rubin processors, handle the most compute-intensive workloads — training large models and running real-time inference. CPU racks manage environments, agents and orchestration logic. Storage racks handle memory and large-scale context management, while networking racks connect the system through high-speed interconnects. In addition, dedicated inference accelerators are integrated to optimize response generation.

The CPU Returns to Center Stage

One of the most notable innovations in the architecture is the Vera CPU rack — a dedicated system designed to address the emerging workloads of autonomous AI agents.

In traditional AI systems, GPUs handled most of the heavy lifting, while CPUs played a supporting role. In the agentic era, however, a growing share of the workload shifts to the CPU: running code, invoking tools, orchestrating workflows, validating outputs and managing simulations.

According to NVIDIA, each CPU rack can include up to 256 Vera processors, capable of running tens of thousands of independent CPU environments in parallel. The processor itself features 88 custom-designed cores, with a strong emphasis on single-thread performance and high memory bandwidth — reaching up to 1.2 TB/s — alongside significant gains in energy efficiency.

A key architectural element is the direct connection between CPU and GPU via NVLink, enabling far faster data exchange than traditional server interconnects. As a result, the CPU is no longer just coordinating GPU workloads — it becomes an integral part of the computation pipeline itself.

Meanwhile, Intel Remains — But in a Reduced Role

At the same time, Intel announced that its Xeon 6 processors have once again been selected as the host CPU for NVIDIA’s DGX Rubin NVL8 systems — servers equipped with eight GPUs that serve as a fundamental building block of the platform.

In these systems, Xeon continues to perform its traditional role: managing GPUs, scheduling workloads and handling data movement. Its strengths remain in large memory capacity, high I/O bandwidth and broad compatibility with existing infrastructure.

However, within the broader Rubin architecture, this role is becoming increasingly limited.

From Blackwell to Rubin: A Shift in Power

The transition from Blackwell to Rubin clearly illustrates the shift. In previous generations, AI infrastructure was largely built around DGX or HGX servers, each combining NVIDIA GPUs with CPUs — typically from Intel. In practice, Xeon processors were present in nearly every system.

With Rubin, the architecture evolves. Instead of uniform servers, the system becomes a heterogeneous, rack-scale infrastructure, where each component is purpose-built. The CPU takes on a more central role — but not necessarily through Intel.

The Vera CPU rack now serves as the layer responsible for running agents and orchestrating the system, while Xeon processors are largely confined to host roles within NVL8 systems. In other words, Intel remains inside the system — but no longer defines its foundation.

The Big Picture: Vertical Integration of AI Infrastructure

The broader picture is clear: NVIDIA is steadily moving toward full-stack control of AI infrastructure — from GPUs to CPUs, networking and storage.

While its partnership with Intel continues, NVIDIA is simultaneously building an internal alternative that strengthens its long-term position. Where the CPU once represented a critical external dependency, it is now becoming another layer under NVIDIA’s own control.

For the data center market, this marks a fundamental shift: from general-purpose servers to AI-native infrastructure — where even the CPU is purpose-built for the agentic era. The result could redefine the balance of power among chipmakers in the years ahead.

Intel Cancels Manufacturing deal With Tower Semiconductor

photo above: Intel’s Fab 11 facility where Tower’s chips were manufactured. Photo: Intel

Intel has withdrawn from its joint manufacturing agreement with Tower Semiconductor at Intel’s Fab 11X chip manufacturing facility in New Mexico, USA. Following the cancellation, the two companies entered arbitration proceedings. Tower has begun transferring customer production from Intel’s facility to its own Fab7 plant in Japan. The development was disclosed in a brief note at the end of Tower’s quarterly report released yesterday.

The manufacturing agreement was signed in September 2023, about three weeks after Intel’s planned acquisition of Tower was terminated. The deal allowed Intel to utilize a largely idle factory producing older-generation technologies. At the time, Intel Foundry Services (IFS) senior executive Stuart Pann said Tower’s investment would enable the equipment to be activated while Intel provided manufacturing services at the site.

Under the agreement, Tower committed to invest approximately $300 million to transfer processes and install production equipment, which would remain its property. In return, IFS would provide manufacturing services for power devices and RF SOI wireless solutions at volumes exceeding 60,000 wafers per month. For Tower, the deal provided significant capacity without building a new fab, using 300mm wafers that offer lower overhead and higher profitability.

Tower now says it transferred manufacturing processes originally developed at its Fab7 facility in Japan to New Mexico, qualified them and began serving customers. The company is currently moving those customers back to Fab7 in order to maintain supply continuity and service levels.

Record 2025 revenue

In the fourth quarter of 2025, Tower’s revenue grew about 14% year over year to approximately $440 million. Full-year 2025 revenue reached a record $1.57 billion, representing 9% growth compared with $1.44 billion in 2024. The company expects first-quarter 2026 revenue of about $412 million, roughly 15% growth year over year.

While the global semiconductor market expanded by more than 26% during this period, the primary growth driver was large advanced-node chips for data centers and AI — areas outside Tower’s core business.

Nearly $1 billion investment in capacity expansion

Tower is currently expanding manufacturing infrastructure for silicon photonics (SiPho) and silicon-germanium (SiGe) components, key technologies for communications and high-frequency RF applications. The company recently added another $270 million to the project, bringing total investment to about $920 million.

The goal is to complete installation and qualification by the fourth quarter of 2026 and begin full mass production in 2027. The project is expected to increase SiGe and SiPho production capacity fivefold compared with the fourth quarter of 2025.

Tower’s improving performance has been reflected in its stock price over the past year. The company now trades at roughly $140 per share on Nasdaq, compared with less than $50 a year ago. Even the dispute with Intel has not shaken the stock, which currently values Tower at about $15.1 billion.

Intel Unveils Its 1.8-Nanometer Panther Lake Processor

[Photo: Intel’s Panther Lake processor, produced at the company’s Oregon fab. Credit: Intel]

Intel has unveiled Panther Lake, its next-generation personal-computing architecture built for the AI era. The new processors, which will launch under the Intel Core Ultra brand, are the company’s first high-volume products manufactured using the Intel 18A process — representing a node size of just 1.8 nanometers.
This advanced production technology introduces two major innovations: RibbonFET transistors, offering faster switching speeds, and PowerVia, a backside power-delivery network that separates power and signal routing layers. Together, these improvements boost performance and transistor density across the chip.

While Panther Lake was a global engineering effort, Intel’s Israeli development team played a leading role, with several hundred engineers collaborating closely with Intel’s Oregon facility.
According to Zohar Tzeva, Intel’s Panther Lake project manager, the team combined two complementary CPU architectures: Lunar Lake, designed in Israel and optimized for power efficiency, and Arrow Lake, introduced earlier in 2025 to deliver high performance for demanding workloads.

A Modular Multi-Chip Design

The new processor adopts a multi-chip module structure built around three main silicon tiles.
The Compute Tile, designed entirely in Israel and fabricated using the 18A process, packs roughly three billion transistors and integrates up to 16 CPU cores alongside an AI engine capable of up to 180 TOPS (trillion operations per second)—enabling complex AI models to run directly on laptops and desktops.
A separate GPU tile, based on Intel’s new Xe3 architecture, is produced by TSMC, while a control tile handles system-level connectivity, supporting interfaces such as PCIe, Wi-Fi, Thunderbolt, and Bluetooth.

From Doubt to Delivery

“Many in the industry doubted Intel’s ability to bring a sub-2 nanometer processor to market,” said Tzeva. “But we proved it can be done. Hundreds of thousands of Panther Lake chips have already been shipped to customers, and we’ll move to full-scale production by the end of 2025. The first computers powered by Panther Lake will hit the market in early 2025.”

Intel declined to specify which processors will underpin its recently announced partnership with NVIDIA. However, the timing suggests Panther Lake will play a central role.

Intel + NVIDIA: A New Era for Laptops

Roughly three weeks ago, Intel and NVIDIA signed a sweeping technology and business collaboration agreement. One of its most intriguing aspects concerns the laptop market: Intel will produce and sell x86-based SoCs that integrate NVIDIA’s RTX GPUs.
Given Panther Lake’s modular design and its positioning for high-performance, energy-efficient laptops, analysts believe it will serve as the foundation for this joint platform—marking NVIDIA’s first-ever direct entry into the $150 billion-a-year PC market.

Intel Seeks Patent for Software-Defined “Supercore”

[Image: Intel Xeon 6 server processors]

By Yohai Schwiger

Intel has filed a U.S. patent application describing a new technology it calls the “Software Defined Supercore.” According to the filing, the company envisions a way to link several physical cores so they function as a single, massive core capable of executing many instructions in parallel.

The idea is to push CPUs closer to the kind of parallel processing long associated with GPUs—without the cost and complexity of designing a physically enormous core. A CPU core is the basic unit that executes software instructions. In early computers, there was only one. Today, most processors include multiple cores, allowing them to run different tasks at once.

Intel’s proposal would make multicore processing more flexible and dynamic: when an application demands concentrated compute power, several cores could be fused into one broad “supercore.” Once the demand subsides, they would return to operating independently.

The implications are especially relevant for artificial intelligence. GPUs have become the workhorses of AI training and inference thanks to their ability to handle thousands of parallel calculations simultaneously. Intel’s approach aims to give CPUs a similar advantage—enabling a core to “scale up” by tapping into additional cores to handle complex workloads, from AI to simulations and high-performance computing.

A Software-First Mindset
In some ways, the concept echoes Nvidia’s CUDA software environment, which allowed developers to tap into GPU architecture in smarter ways and helped transform GPUs into essential engines for AI and advanced computation. Intel is seeking to provide a comparable software layer, though here the goal is to orchestrate CPU cores rather than hundreds or thousands of GPU threads.

What makes this effort especially noteworthy is the signal it sends about Intel’s software ambitions. In the company’s most recent earnings call, new CEO Lip-Bu Tan admitted Intel had lost its edge in software innovation in recent years, vowing to bring it back to the forefront. The Supercore patent filing may be an early sign of that strategy, reminding the industry that Intel’s focus extends beyond silicon into the software that directs it.

Still, it is important to stress that this is only a patent application—not a finished product. Implementing such a concept would require significant changes at multiple levels, from hardware design to operating systems and developer tools. In other words, the Supercore remains an intriguing idea with clear potential, but one that could take many years to materialize—if it ever does—in Intel’s commercial processors.

Washington Grabs 10% Stake, Trump Emerges as Intel’s New Boss

 

 

By Yohai Schwiger

After days of tense negotiations, Intel announced on Friday that the U.S. government will acquire a 9.9% stake in the company, valued at $8.9 billion. While Intel presented it as a fresh investment, the reality is more nuanced: no new money is coming in. Instead, grants already promised under the Biden administration—$5.7 billion from the CHIPS Act and $3.2 billion from the Secure Enclave program—are being converted into direct equity.

In effect, Trump rewrote the rules after the fact. Instead of receiving cash grants as originally intended, Intel was forced to accept government ownership. The deal is unprecedented: at the outset, no such equity requirement existed. The shares were issued at $20.47 apiece—below the current market price of $24.80—making the U.S. government Intel’s largest shareholder, ahead of major investors Vanguard and BlackRock.

Under the agreement, Washington holds no direct management rights but must vote in line with the board’s recommendations, with limited exceptions. That makes the government a “passive” shareholder in name, yet with potential to influence corporate decisions in the future.

Trump’s Victory

The deal is being cast as a personal triumph for President Donald Trump and another showcase of his negotiating prowess. Just last week, Trump publicly demanded that CEO Lip-Bu Tan—who only recently took the helm—resign, citing close ties with China and potential conflicts of interest. The demand weakened Tan’s standing and left him cornered in negotiations. The compromise: Tan keeps his job, but the U.S. becomes Intel’s dominant shareholder.

Trump wasted no time in claiming the win. “He walked in wanting to keep his job and he ended up giving us $10 billion for the United States. So we picked up $10 billion”, Trump said on Friday. The sarcastic tone turned the deal into not just a financial arrangement but also a political statement—reinforcing Trump as the ultimate victor and Intel as the subordinate.

Tan’s Defeat

For Tan, the episode is a major blow. He has been trying to drive a sweeping restructuring at Intel, with an emphasis on rebuilding the foundry business. But the public call for his resignation has shaken his authority inside and outside the company.

Recently, Tan unveiled a cost-cutting program alongside Intel’s Q2 earnings report and stressed a cautious approach to expanding production capacity in order to protect the company’s finances. He canceled projects in Europe and announced a pause in expanding Intel’s Ohio fabs—moves at odds with the administration’s stated goal of ramping up domestic manufacturing. Now, with Washington as Intel’s largest shareholder, the company may find itself pressured to resume the Ohio expansion to satisfy political demands. From here on, every quarterly report and strategic move will be scrutinized not only by investors and Intel’s board, but also by the U.S. government, now a de facto partner.

What It Means for Intel

Government backing brings obvious advantages: better odds of winning defense and federal contracts, regulatory favoritism, political support, and an image of stability that could attract more U.S. corporate customers. It also signals to Wall Street that Intel has a government safety net, potentially lowering its risk profile.

But the downside is real. Market watchers fear Washington may not remain a “passive” owner forever. Even a symbolic presence could evolve into real influence over strategy—blurring the line between corporate independence and state control.

The Death of the CHIPS Act as We Knew It

The Intel deal could set a precedent. If Washington can turn CHIPS Act grants into equity stakes, the same could happen to other companies—Micron, GlobalFoundries, even TSMC and Samsung’s U.S. operations. This fundamentally alters the nature of the CHIPS Act. What began as a straightforward subsidy program to incentivize semiconductor investment is morphing into a mechanism for government leverage, forcing companies to surrender ownership in exchange for support.

It’s worth remembering that Trump himself opposed the CHIPS Act when it was passed, calling it a waste of taxpayer money and vowing to repeal it. Instead of scrapping the law, he has reshaped it—transforming grants into equity stakes and turning the program into a direct instrument of ownership and control. For Intel and its peers, “free” money is no longer free.

Tan Reverses Course; Slams Intel’s Former Management

Photo above: Lip-Bu Tan. Leading Intel’s broad restructuring plan

Intel CEO Lip-Bu Tan presented his new vision and roadmap for the company Thursday, marking a dramatic departure from the direction taken by Intel in recent years. Tan is spearheading a sweeping operational and cultural restructuring, which includes massive layoffs, the cancellation of mega-projects in Europe, and a sharp cut in development budgets.

Initial reports of Tan’s cost-cutting strategy emerged when he took the reins in April 2025, but with the release of Intel’s Q2 report this week, the scale became clear: the company officially confirmed the largest layoff in its history, which will reduce Intel’s workforce by approximately 15%.

The restructuring plan includes cutting around 24,000 jobs. By year-end, Intel will employ about 75,000 people. The initiative aims to slash operating expenses to $17 billion in 2025 and $16 billion in 2026. Layoffs are already underway, with reports surfacing in recent weeks of business unit closures and staff exits. In Q2 2025, Intel posted $12.9 billion in revenue and a staggering GAAP net loss of $2.9 billion. The company also issued a disappointing Q3 forecast: revenue of $12.6–13.6 billion with zero adjusted earnings per share.

18A Failed, Servers Were Complex and Pricy

In the earnings call with investors and analysts, Tan delivered scathing criticism of the company’s previous management. “We have a lot to fix to move this company forward,” he said, referring to the failed 18A process node, once considered Intel’s flagship engineering initiative. “We’ve learned a lot from the mistake we made with 18A. We’re applying those lessons now to 14A,” he added. Tan emphasized that 14A investments would only proceed if there’s tangible demand: “I will invest only when I’m convinced the returns are there.”

He was equally blunt about the company’s server processor strategy, which focused on multi-core, multi-threaded chips with dozens of cores and hundreds of threads—regardless of actual market demand. “That approach led to overly complex and expensive CPUs whose performance didn’t justify the cost,” he said. “We are now shifting to a leaner, more focused product line that addresses real customer needs. I will not greenlight chips just because we can build them—only if the market justifies it. I’m fixing the mistakes made in recent years.”

CEO Will Personally Approve All Major Silicon Designs

Tan said Intel still maintains a strong position in the traditional server market. “We’re seeing healthy demand, but we need to improve performance-per-watt for our hyperscale server CPUs,” he explained. “I’ve already taken steps to undo mistakes in multi-threading architecture and am now in the process of bringing in new leadership for our Data Center Group. Expect announcements in the coming months.”

He added: “My directive for future silicon designs is clear: products must feature clean, simple architectures and better cost structures. From now on, every major CPU design will require my personal review and approval before tape-out. This will enhance execution speed, sharpen our focus, and reduce development costs.”

A Software-First Shift in AI Strategy

One of Tan’s most significant announcements was a major strategic shift in Intel’s approach to software, particularly in artificial intelligence. “In the past, we approached AI with a narrow focus on silicon and training—without building an integrated hardware-software stack,” he said. “Our AI strategy must now center on the x86 CPU architecture and Xe GPU architecture, but we must rise to a higher level of abstraction—offering full-stack solutions that include both hardware and software. This is an area where Intel was weak or entirely absent. Under my leadership, that will change.”

“To be the preferred computing platform, we need to deeply understand the most important computing trends and respond with an integrated approach—developing both software and silicon. In the coming months, we’ll provide more details on our efforts to build unified AI capabilities across hardware and software. It will take time, but it’s essential if Intel is to remain relevant in the next computing wave.”

“An Engineering Vision With No Commercial Spine”

At the core of Tan’s critique was the previous CEO’s manufacturing strategy, which centered on building massive fabs for the Foundry Services division in Ohio, Germany, Poland, and Costa Rica—before securing sufficient customers. “We need to build manufacturing capacity wisely and cautiously, aligned with customer demand and business needs,” Tan said. “The investment in recent years far exceeded actual demand and was done in an unwise and excessive manner. Our manufacturing footprint became too dispersed. Going forward, we will grow capacity only when we have volume commitments and will allocate resources gradually based on milestones.”

Tan reiterated this principle throughout the call: “I don’t believe in the ‘if we build it, they will come’ mindset. Under my leadership, we will build what customers need, when they need it—and we will earn back their trust. That applies both to Foundry projects and to future process nodes. We cannot afford an engineering vision with no commercial backbone.”

Mega Projects Canceled, Engineers Back to Office

Tan’s words have already translated into concrete actions. Intel has canceled the planned €10 billion fab in Germany and the assembly plant in Poland. The Costa Rica site will focus only on R&D, with assembly operations shifting to lower-cost countries like Vietnam and Malaysia. Intel will significantly cut its capital expenditures and slow geographic expansion.

“Our operational metrics already reflect the impact of the changes we’ve started implementing,” Tan said. Going forward, Intel will concentrate resources on just three areas: Foundry-as-a-Service, AI chips, and enhancing existing products. Management layers will be reduced by 50%, and engineers will be required to work on-site at least four days a week. “We need to become a fast, precise, and lean company—like our competitors in Asia,” he said. “This is a strategic shift. We’re not measuring ourselves by near-term earnings, but by our ability to stay relevant over the next two years. The transition to AI and foundry services isn’t a luxury—it’s a necessity.”

Intel and Weizmann Institute Remove the “Speculative Decoding” Bottleneck

Top image: Nadav Timor (right) and Prof. David Harel. Photo: Weizmann Institute of Science

A joint team from Intel Labs and the Weizmann Institute of Science has presented a groundbreaking method for significantly accelerating AI processing based on large language models (LLMs). The research was showcased this week at ICML 2025 in Vancouver, Canada—one of the world’s top AI and machine learning conferences. The paper was selected for oral presentation, a rare honor granted to only 1% of the approximately 15,000 submissions.

The work was led by Prof. David Harel and PhD student Nadav Timor from the Weizmann Institute, in collaboration with Moshe Wasserblat, Oren Pereg, Daniel Korat, and Moshe Bartchansky from Intel, along with Gaurav Jain from d-Matrix. While LLMs like ChatGPT and Gemini are powerful, they are also slow and resource-hungry. As early as 2022, the industry began exploring ways to speed up inference by splitting tasks between different algorithms. This led to the emergence of Speculative Decoding, in which a smaller, faster model “guesses” the next output tokens, and the larger model only verifies the guess instead of computing it from scratch.

A Fast, Lightweight Helper Model

How does it work? In the standard process, LLMs must compute huge operations for every word they generate. For example, to complete the sentence “The capital of France is…”, a model might generate “Paris”, then read “The capital of France is Paris” and compute again to generate “a”, then once more to generate “city”. In total, it performs three heavy compute steps for three words.

With speculative decoding, a fast auxiliary model first drafts the entire phrase—“Paris”, “a”, “city”. Then the larger model checks the full draft in a single validation step. If the guess is correct, all three words are accepted, drastically reducing processing time.

From right to left: Moshe Wasserblat, Oren Pereg, Daniel Korat, and Moshe Bartchansky. Photo: Intel

The Bottleneck That Held the Industry Back

Although speculative decoding has been known for over three years, real-world adoption has been difficult. That’s because LLMs don’t truly “understand” words—they operate based on statistical relationships between tokens. Each model develops its own internal “digital language” of token IDs. For example, the word “apple” might be token #123 in one model and #987 in another.

Until now, speculative decoding only worked when both models (large and auxiliary) used the exact same tokenizer and architecture—usually only possible if they were built by the same company. Developers couldn’t simply pair any fast model with any LLM; they were locked into specific ecosystems.

This created a major bottleneck. The Israeli team overcame this with a new class of algorithms that decouple helper models from LLM architectures, making them cross-compatible across platforms, vocabularies, and companies.

A Surprising Solution to the Compatibility Problem

To bridge this gap, the researchers developed two key techniques. First, an algorithm that enables an LLM to translate its “thoughts” into a language understood by other models. Second, an algorithm that ensures both the large and small models rely primarily on token cognates—tokens with equivalent meanings across different token vocabularies.

“At first we feared that too much would get ‘lost in translation’ and the models wouldn’t sync,” said Nadav Timor, a PhD student in Prof. Harel’s lab and lead author of the paper. “But our fears proved unfounded.”

According to Timor, the algorithms achieved up to 2.8× speedups in LLM performance—resulting in dramatic compute cost savings. “This makes speculative decoding accessible to any developer,” he said. “Until now, only companies with the resources to train custom small models could benefit from these techniques. For a startup, building such a model would have required deep expertise and significant investment.”

Now Available on Hugging Face

The new algorithms have already been integrated into the open-source platform Hugging Face, making them freely available to developers worldwide.

Read the full research paper:
https://arxiv.org/pdf/2502.05202