sourcegraph
February 8, 2023

Last May, Sandra Rivera, an executive at chip giant Intel, got some shocking news.

Engineers have spent more than five years developing a powerful new microprocessor to perform computing tasks in data centers and are confident they’ve finally got the right product. But signs of potentially serious technical flaws emerged during a routine morning meeting to discuss the project.

The problem is so troubling that the microprocessor, code-named Sapphire Rapids, had to be delayed — the latest in a series of setbacks for one of Intel’s most important products in years.

“We’re very frustrated,” said Ms. Rivera, executive vice president in charge of Intel’s data center and artificial intelligence group. “It was a painful decision.”

Sapphire Rapids’ launch was finally pushed back to Tuesday from mid-2022, almost two years later than expected. The long-term development of the product — which combines four chips in a single package — highlights some of the challenges facing Intel’s turnaround as the U.S. tries to assert its dominance in basic computer technology.

Since the 1970s, Intel has been a leader in the small silicon chips that run most electronic devices, most notably a variety called a microprocessor, which acts as the electronic brain in most computers. But the Silicon Valley company has lost in recent years its long-standing lead in manufacturing technology that helps determine the computing speed of chips.

Patrick Gelsinger, who becomes Intel’s chief executive in 2021, has vowed to restore its manufacturing edge and build a new factory in the United States. He was a leading figure last summer as Congress debated and passed legislation to reduce U.S. reliance on Taiwanese chip manufacturing. China claims Taiwan as its territory.

Sapphire Rapids’ bumpy ride has implications for whether Intel can bounce back to deliver future chips on time. It’s an issue that could affect many computer makers and cloud service providers, not to mention the millions of consumers who use online services that may be powered by Intel technology.

“What we want is a predictable and steady pace,” said Kirks Kogan, Lenovo’s executive vice president of server sales, as the Chinese company plans to roll out 25 new systems based on the new processor. “Sapphire Rapids is where the journey begins.”

For Intel, the pressure is on. The company faces stiff competition in server chips, its most profitable business, as demand for chips used in personal computers falls. The issue has worried Wall Street, with Intel’s market value plummeting more than $120 billion since Gelsinger took over.

Intel plans to host an online event on Tuesday to discuss Sapphire Rapids, named after a part of the Colorado River. More formally, the product is known as the 4th Generation Intel Xeon Scalable processors.

Mr Kissinger said in an interview that despite the delays, Sapphire Rapids had what it took to succeed. He chose Ms. Rivera in 2021 to take over the division that developed it, and she is using lessons learned to change the way Intel designs and tests its products. He said Intel has conducted several internal reviews of what happened at Sapphire Rapids, “and we’re not done yet.”

Sapphire Rapids started in 2015 as a discussion among a small group of Intel engineers. The product is the company’s first attempt at a new approach to chip design. Companies now typically pack tens of billions of tiny transistors on each silicon chip, but competitors like Advanced Micro Devices and others have begun bundling multiple chips in plastic packages to make processors.

Intel engineers came up with a design consisting of four chips, each with 15 processor “cores,” like stand-alone calculators for general-purpose computing work. The company also decided to add additional circuit blocks for special tasks, including artificial intelligence and encryption, and to communicate with other components, such as chips that store data.

Shlomit Weiss, who co-leads Intel’s design engineering team, said the interaction of so many elements is “very complex.” “Complexity often creates problems.”

The Sapphire Rapids team works hard to resolve defects, flaws, caused by designer errors or manufacturing failures, that can cause chips to miscalculate, work slowly, or stop functioning. They are also affected by delays in the product manufacturing process.

But in December 2019, engineers reached a milestone called “tapeout.” At that time, the electronic files containing the complete design are transferred to the factory to make sample chips.

Sample chips arrived in early 2020 due to the lockdown forced by Covid-19. Engineers quickly got the computing cores on Sapphire Rapids communicating with each other, said Nevine Nassif, the project’s chief engineer. But more work than expected remains.

A key chore is “verification,” a testing process in which Intel and its customers run software on sample chips to simulate computing chores and find bugs. Once a defect is found and fixed, the design may be returned to the factory to make new test chips, which typically takes more than a month.

Repeating the process resulted in missed deadlines. Ms Nassif said Sapphire Rapids was designed to counter AMD’s Milan processors, due to launch in March 2021. But it wasn’t ready by June, when Intel announced it was delaying more validation until next year.

That’s when Ms. Rivera stepped in. The longtime Intel executive had successfully built a networking products business before being named chief human resources officer in 2019.

“We have to restore execution,” Mr. Kissinger said. “I need someone to step up and fix this for me.”

In October 2021, Ms. Rivera and a senior design executive established a weekly Sapphire Rapids status meeting, held every Monday morning at 7am. Those meetings showed steady progress in finding and fixing bugs, boosting confidence that production could start in the second quarter of 2022, she said.

Then there’s the flaw discovered last May. Ms. Rivera would not elaborate, but said it affected the performance of the processor. In June, she used an investor event to announce a delay of at least a quarter, which pushed Sapphire Rapids until November, when it could launch competing AMD chips.

“We are ready to ship,” Ms Nassif said. The final delay “is just so sad considering all the work that has gone into it.”

Ms Rivera sees a series of lessons in the setback. One reason is simply that Intel packed too much innovation into Sapphire Rapids instead of delivering a less ambitious product earlier.

She also concluded that the team should have spent more time refining and testing its design using computer simulations. Ms. Rivera said it was less expensive to catch errors in sample chips before they showed up, and features could be removed to simplify the product. She has since turned to strengthening Intel’s simulation and verification capabilities.

“We used to have a lot of these muscles and we let them atrophy,” Ms. Rivera said. “Now we’re rebuilding.”

She also determined that Intel was scheduling more products than its engineers and customers could easily handle. As a result, she simplified the product roadmap, including delaying the Sapphire Rapids successor from 2023 to 2024.

More broadly, Ms. Rivera and other Intel executives have pushed the organization to develop better processes for documenting technical issues and sharing that information inside and outside the company.

Some Intel customers say communication has gotten better.

“Is everything going well? No,” said Lenovo’s Mr. Skaugen, who formerly ran Intel’s server chip business. “But we’re a lot less surprised than we were in the past.”



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *