NeuReality launches its much anticipated, fully-integrated NR1™ AI Inference solution next week at the international SC23 Conference – a long awaited cure for the ailments of big CPU-centric data centers of today that suffer from high inefficiency and expense. Now with 10x performance, 90 cent cost savings on AI operations per dollar, and a line-up of business partners and customers, NeuReality will demonstrate the world’s first affordable, ultra scalable AI-centric servers designed purely for inference; meaning, the daily operation of a trained AI model.
As expensive as it is to run live AI data in the world’s data centers, AI inferencing remains a blind spot in our industry, according to NeuReality Co-founder and CEO Moshe Tanach. “ChatGPT is a new and popular example, of course, but generative AI is its infancy. Today’s businesses are already struggling to run everyday AI applications affordably – from voice recognition systems and recommendation engines to computer vision and risk management,” says Tanach. “Generative AI is on their horizon too, so it’s a compounding problem that requires an entirely new AI-centric design ideal for inferencing. Our customers will benefit immediately from deploying our easy-to-install and easy-to-use solution with established hardware and solution providers.”
Anticipating the need for more affordable, faster, and scalable AI inference goes back to before 2019 when NeuReality was founded. The company focuses on one of the biggest problems in artificial intelligence; that is, making the inference phase both economically sustainable and scalable enough to support consumer and enterprise demand as AI accelerates. But for every $1 spent on training an AI model today, businesses spend about $8 to run those models, according to Tanach. “That astronomical energy and financial cost will only grow as AI software, applications and pipelines ramp up in the years to come on top of larger more sophisticated AI models.”
With the NR1 system, future AI-centric data centers will see 10x performance capability to empower financial, healthcare, government and small businesses helping them to create better customer experiences with more AI inside their products. That in turn can help companies generate more top-line revenue while decreasing bottom-line costs by 90 percent.
“NeuReality’s AI inference system comes at the right time when customers not only desire scalable performance and lower total cost of ownership, but also want open-choice, secure and seamless AI solutions that meet their unique business needs,” says Scott Tease, Vice President, General Manager, Artificial Intelligence and HPC WW at Lenovo.
“NeuReality is bringing highly efficient and easy-to-use AI innovation to the data center. Working together with NeuReality, Lenovo looks forward to extending this transformative AI solution to customer data and delivering rapid AI adoption for all. As a leader in our Lenovo AI Innovators Program, NeuReality’s technologies will help us to deliver proven cognitive solutions to customers as they embark on their AI journeys,” says Tease.
At SC23 next week, NeuReality will demonstrate its easy-to-deploy software development kit, APIs, and two flavors of hardware technology: the NR1-M™ AI Inference Module and the NR1-S™ AI Inference Appliance. Along with OEM and Deep Learning Accelerator (DLA) providers, each demo addresses specific market sectors and AI applications that showcase the breadth of NeuReality’s technology stack and compatibility with all DLAs. The systems architecture will feature one-of-kind, patented technologies including:
- NR1 AI-Hypervisor™ hardware IP: a novel hardware sequencer that offloads data movement and processing from the CPU, an architectural cornerstone for heterogenous compute semiconductor device;
- NR1 AI-over-Fabric™ network engine: an embedded NIC (Network Interface Controller) with offload capabilities for an optimized network protocol dedicated for inference. The AIoF™ (AI-over-Fabric) protocol optimizes networking between AI clients and servers as well as between connected servers forming a large language model (LLM) cluster or other large AI pipelines;
- NR1 NAPU™ (Network Addressable Processing Unit): a network-attached heterogenous chip for complete AI-pipeline offloading, leveraging Arm cores to host Linux-based server applications with native Kubernetes for cloud and data center orchestration.
“The next era of AI relies on broad deployment of ML inference in order to unlock the power of LLMs and other maturing models in new and existing applications, ” says Mohamed Awad, Senior Vice President and General Manager, Infrastructure Line of Business, Arm. “Arm Neoverse delivers a versatile and flexible technology platform to enable innovative custom silicon such as NeuReality’s NR1™ NAPU, which brings to market a powerful and efficient form of specialized processing for the AI-centric data center.”
NeuReality is shipping by the end of 2023 with an established value chain of software partners, original equipment manufacturers (OEMs), semiconductor deep learning accelerators (DLA) suppliers, cloud service providers, and enterprise IT solution companies such as Arm, AMD, CBTS, Cirrascale, IBM, Lenovo, Qualcomm, Supermicro, and more. As a result, financial services, healthcare, government and smaller businesses can expect to access easy-to-deploy and easy-to-use AI inference solutions from NeuReality with profitable performance.
“We are thrilled to be working with NeuReality to deliver inference-as-a-service in banking, insurance and investment services,” says PJ Go, CEO, Cirrascale Cloud Services. “As a specialized cloud and managed services provider deploying the latest training and inference compute with high-speed storage at scale, we focus on helping customers choose the right platform and performance criteria for their cloud service needs. Working with NeuReality to help solve for inference – arguably the biggest issue facing AI companies today – will undoubtedly unlock new experiences and revenue streams for our customers.”
Tanach adds: “Along with our partners, we have re-imagined inference and, in the process, have set the standard for the future of AI which is more cost effective, carbon-conscious and performance-driven. The NAPU is the Swiss army knife of AI-inference servers – easily integrated into any existing system architecture and with any DLA. So, no one needs to wait two or three years for someone to invent the ideal AI inference chip. We already have it.”