Habana Labs Announced Super-fast AI Processor
17 June, 2019
Gaudi represents a novel architecture designed for the needs of Artificial Intelligence. The processor includes in-chip 100GbE Ethernet ports
The semiconductor newcomer, Habana Labs from Israel, is determined to rewrite the rules of AI processors market. The company announced this week the Habana Gaudi AI Training Processor, that is supposed to deliver an increase in throughput of up to four times over systems built with equivalent number GPUs. Gaudi is based on an innovative architecture that enables near-linear scaling of training systems performance and high throughput even at small batch sizes. This allows the scaling from a single-device to very-large systems consist of hundreds of Gaudi processors.
Computing and Connectivity in a Single Chip
Gaudi brings another surprising industry first to AI training: The the AI processor included on-chip integration of RDMA over Converged Ethernet (RoCE v2) functionality, to enable the scaling of AI systems to any size, using standard Ethernet. It means that unlike GPU-based systems rely on proprietary system interfaces, Habana’s systems rely on standard interface. Ethernet switches are multi-sourced, offering virtually unlimited scalability in speeds and port-count, and are already used in data centers to scale compute and storage systems.
“Gaudi offers strong performance and industry-leading power efficiency among AI training accelerators,” commented Linley Gwennap, principal analyst of The Linley Group. “As the first AI processor to integrate 100G Ethernet links with RoCE support, it enables large clusters of accelerators built using industry-standard components.” The Gaudi processor includes 32GB of HBM-2 memory and is currently offered in two forms: HL-200 – a PCIe card supporting eight ports of 100Gb Ethernet; HL-205 – a mezzanine card compliant with the OCP-OAM specification, supporting 10 ports of 100Gb Ethernet or 20 ports of 50Gb Ethernet.
Habana is also introducing an 8-Gaudi system called HLS-1, which includes eight HL-205 Mezzanine cards, with PCIe connectors for external Host connectivity and 24 100Gbps Ethernet ports for connecting to off-the-shelf Ethernet switches, thus allowing scaling-up in a standard 19’’ rack by populating multiple HLS-1 systems. Gaudi is the second purpose-built AI processor to be launched by Habana Labs. Last year it announced the Habana Goya AI Inference Processor. Goya has been shipping since Q4, 2018, and has demonstrated industry-leading inference performance, with the industry’s highest throughput, highest power efficiency (images-per-second per Watt), and real-time latency.
Facebook is using Habana’s Inference Processor
“Training AI models require exponentially higher compute every year, so it’s essential to address the urgent needs of the datacenter and cloud for radically improved productivity and scalability. Gaudi’s innovative architecture delivers the highest performance while integrating standards-based Ethernet connectivity for unlimited scale,” said David Dahan, CEO and Co-founder of Habana Labs. “Gaudi will disrupt the status quo of the AI Training processor landscape.”
“Facebook is seeking to provide open platforms for innovation around which our industry can converge,” said Vijay Rao, Director of Technology, Strategy at Facebook. “We are pleased that the Habana Goya AI inference processor has implemented and open-sourced the back-end for the Glow machine learning compiler and that the Habana Gaudi AI training processor is supporting the OCP Accelerator Module (OAM) specification.”
Habana will be sampling the Gaudi to selected customers in the second half of 2019. Habana Labs was founded in 2016 in Tel Aviv by silicon experts. In November 2018, it has secured $75 million in series B funding led by Intel Capital. The round brought the total investments in the company to $120 million.