NeuReality Launches NR1 AI Inference Solution in Time to Unleash Generative AI Across All Industries

NeuReality launches its much anticipated, fully-integrated NR1™ AI Inference solution next week at the international SC23 Conference – a long awaited cure for the ailments of big CPU-centric data centers of today that suffer from high inefficiency and expense. Now with 10x performance, 90 cent cost savings on AI operations per dollar, and a line-up of business partners and customers, NeuReality will demonstrate the world’s first affordable, ultra scalable AI-centric servers designed purely for inference; meaning, the daily operation of a trained AI model.

As expensive as it is to run live AI data in the world’s data centers, AI inferencing remains a blind spot in our industry, according to NeuReality Co-founder and CEO Moshe Tanach. “ChatGPT is a new and popular example, of course, but generative AI is its infancy. Today’s businesses are already struggling to run everyday AI applications affordably – from voice recognition systems and recommendation engines to computer vision and risk management,” says Tanach. “Generative AI is on their horizon too, so it’s a compounding problem that requires an entirely new AI-centric design ideal for inferencing. Our customers will benefit immediately from deploying our easy-to-install and easy-to-use solution with established hardware and solution providers.”

Anticipating the need for more affordable, faster, and scalable AI inference goes back to before 2019 when NeuReality was founded. The company focuses on one of the biggest problems in artificial intelligence; that is, making the inference phase both economically sustainable and scalable enough to support consumer and enterprise demand as AI accelerates. But for every $1 spent on training an AI model today, businesses spend about $8 to run those models, according to Tanach. “That astronomical energy and financial cost will only grow as AI software, applications and pipelines ramp up in the years to come on top of larger more sophisticated AI models.”

With the NR1 system, future AI-centric data centers will see 10x performance capability to empower financial, healthcare, government and small businesses helping them to create better customer experiences with more AI inside their products. That in turn can help companies generate more top-line revenue while decreasing bottom-line costs by 90 percent.

“NeuReality’s AI inference system comes at the right time when customers not only desire scalable performance and lower total cost of ownership, but also want open-choice, secure and seamless AI solutions that meet their unique business needs,” says Scott Tease, Vice President, General Manager, Artificial Intelligence and HPC WW at Lenovo.

“NeuReality is bringing highly efficient and easy-to-use AI innovation to the data center. Working together with NeuReality, Lenovo looks forward to extending this transformative AI solution to customer data and delivering rapid AI adoption for all.  As a leader in our Lenovo AI Innovators Program, NeuReality’s technologies will help us to deliver proven cognitive solutions to customers as they embark on their AI journeys,” says Tease.

At SC23 next week, NeuReality will demonstrate its easy-to-deploy software development kit, APIs, and two flavors of hardware technology: the NR1-M™ AI Inference Module and the NR1-S™ AI Inference Appliance. Along with OEM and Deep Learning Accelerator (DLA) providers, each demo addresses specific market sectors and AI applications that showcase the breadth of NeuReality’s technology stack and compatibility with all DLAs. The systems architecture will feature one-of-kind, patented technologies including:

  1. NR1 AI-Hypervisor™ hardware IP: a novel hardware sequencer that offloads data movement and processing from the CPU, an architectural cornerstone for heterogenous compute semiconductor device;
  2. NR1 AI-over-Fabric network engine: an embedded NIC (Network Interface Controller) with offload capabilities for an optimized network protocol dedicated for inference. The AIoF™ (AI-over-Fabric) protocol optimizes networking between AI clients and servers as well as between connected servers forming a large language model (LLM) cluster or other large AI pipelines;
  3. NR1 NAPU (Network Addressable Processing Unit): a network-attached heterogenous chip for complete AI-pipeline offloading, leveraging Arm cores to host Linux-based server applications with native Kubernetes for cloud and data center orchestration.

“The next era of AI relies on broad deployment of ML inference in order to unlock the power of LLMs and other maturing models in new and existing applications, ” says Mohamed Awad, Senior Vice President and General Manager, Infrastructure Line of Business, Arm. “Arm Neoverse delivers a versatile and flexible technology platform to enable innovative custom silicon such as NeuReality’s NR1™ NAPU, which brings to market a powerful and efficient form of specialized processing for the AI-centric data center.”

NeuReality is shipping by the end of 2023 with an established value chain of software partners, original equipment manufacturers (OEMs), semiconductor deep learning accelerators (DLA) suppliers, cloud service providers, and enterprise IT solution companies such as Arm, AMD, CBTS, Cirrascale, IBM, Lenovo, Qualcomm, Supermicro, and more. As a result, financial services, healthcare, government and smaller businesses can expect to access easy-to-deploy and easy-to-use AI inference solutions from NeuReality with profitable performance.

“We are thrilled to be working with NeuReality to deliver inference-as-a-service in banking, insurance and investment services,” says PJ Go, CEO, Cirrascale Cloud Services. “As a specialized cloud and managed services provider deploying the latest training and inference compute with high-speed storage at scale, we focus on helping customers choose the right platform and performance criteria for their cloud service needs. Working with NeuReality to help solve for inference – arguably the biggest issue facing AI companies today – will undoubtedly unlock new experiences and revenue streams for our customers.”

Tanach adds: “Along with our partners, we have re-imagined inference and, in the process, have set the standard for the future of AI which is more cost effective, carbon-conscious and performance-driven. The NAPU is the Swiss army knife of AI-inference servers – easily integrated into any existing system architecture and with any DLA. So, no one needs to wait two or three years for someone to invent the ideal AI inference chip. We already have it.”

NeuReality’s First AI Inference Server-on-a-Chip Validated and moved to Production

NeuReality’s 7nm AI-centric NR1 chip moved its final, validated design to TSMC manufacturing, creating the world’s first AI-centric server-on-a-chip (SOC). A major step for the semiconductor industry, NeuReality will transform AI inference solutions used in a wide range of applications – from natural language processing and computer vision to speech recognition and recommendation systems.

With the mass deployment of AI as a service (AIaaS) and infrastructure-hungry applications such as ChatGPT, NeuReality’s solution is crucial for an industry urgently in need of affordable access to modernized, AI inference infrastructure. In trials with AI-centric server systems, NeuReality’s NR1 chip demonstrated 10 times the performance at the same cost when compared to conventional CPU-centric systems. These remarkable results signal NeuReality’s technology as a bellwether for achieving cost-effective, highly-efficient execution of AI inference.

AI Inference traditionally requires significant software activity at eye-watering costs.  NeuReality’s final steps from validated design to manufacturing – known in the industry as “tape-out” – signals a new era of highly integrated, highly scalable AI-centric server architecture.

The NR1 chip represents the world’s first NAPU (or Network Addressable Processing Unit) and will be seen as an antidote to an outdated CPU-centric approach for inference AI, according to Moshe Tanach, Co-Founder and CEO of NeuReality. “In order for Inference-specific deep learning accelerators (DLA) to perform at full capacity, free of existing system bottlenecks and high overheads, our solution stack, coupled with any DLA technology out there, enables AI service requests to be processed faster and more efficiently, ” said Tanach.

“Function for function, hardware runs faster and parallelizes much more than software. As an industry, we’ve proven this model, offloading the deep learning processing function from CPUs to DLAs such as the GPU or ASIC solutions. As in Amdahl’s law, it is time to shift the acceleration focus to the other functions of the system to optimize the whole AI inference processing. NR1 offers an unprecedented competitive alternative to today’s general-purpose server solutions, setting a new standard for the direction our industry must take to fully support the AI Digital Age.” added Tanach.

NeuReality is moving the dial for the industry, empowering the transition from a largely software centric approach to a hardware offloading approach where multiple NR1 chips work in parallel to easily avoid system bottlenecks. Each NR1 chip is a network-attached heterogeneous compute device with multiple tiers of programmable compute engines including PCIe interface to host any DLA; an embedded Network Interface controller (NIC) and an embedded AI-hypervisor, a hardware-based sequencer that controls the compute engines and shifts data structures between them. Hardware acceleration throughout NeuReality’s automated SDK flow lowers the barrier to entry for small, medium, and large organizations that need excellent performance, low power consumption and affordable infrastructure – as well as ease of use for inferencing AI services.

“We are excited about our first generation NAPU product, proven, tested, and ready to move to manufacture. It’s full steam ahead as we reach this highly anticipated manufacturing stage with our TSMC partners. Our plan remains to start shipping product directly to customers by the end of the year,” says Tanach

NeuReality Appoints Lynn Comp, Semiconductor Veteran and AMD Corporate VP, to its Board

NeuReality, industry experts in AI-centric inference, has announced the appointment of Lynn A Comp to its Board of Directors. As a semiconductor veteran and leader with vast experience in data center infrastructure, Lynn joins NeuReality’s experienced board consisting of industry heavyweights CJ Bruno, with his wealth of experience in sales and marketing from a distinguished career at Intel, and AI luminary Naveen Rao, a leading industry voice in generative AI and deep learning. Lynn brings invaluable knowledge of the technology underpinning inference and virtual cloud, as well as the sector acumen needed to help guide NeuReality through the complex market dynamics and drive the company through its next integral period of growth.

As AI inference becomes highly sought after, in major part due to the proliferation of Large Language Models (LLMs) such as ChatGPT, AI computation systems must adapt to keep it performing at higher efficiency and optimal Total Cost of Ownership (TCO). NeuReality is building a first-in-class capability to fortify the future of AI inference, helping to improve the performance of machine learning systems by offloading computational tasks the Data-movement and processing requires for hosting Deep Learning Accelerators, from general-purpose CPUs. Moving from the time and resource-intensive CPU-centric model to NeuReality’s AI-centric model and server-on-a-chip solution helps lower cost, reduce required energy input and increase throughput. Lynn’s proven track record in Go-To-Market strategies, as well as her intimate understanding and relationship with the compute industry, fits hand in glove with NeuReality’s vision and solutions, and will contribute to ensuring NeuReality’s solutions realize their full potential.

“I’m excited to be joining NeuReality’s board and welcome the challenge to help steer the company through its next phase of growth,” said Mrs Comp. “The company’s innovative approach to AI-focused computing platforms offers the market an easily integrated, end to end inference solution that speeds up AI deployments while reducing data center footprint and power. I am passionate about simplifying the application of AI to IT infrastructure and take on this role with the intention of helping make this a reality,” added Lynn.

“We are thrilled to welcome Lynn to our board,” said Moshe Tanach, Co-Founder and CEO of NeuReality.” “Adding Lynn to our board of directors with her extensive business experience with semiconductors at AMD, as well as data centers for various market sectors, is one of multiple important steps we are taking as we shift from development to production. Drawing on Lynn’s unique expertise and leadership in the semiconductor market, we are looking forward to working together and refining the GTM strategy and execution for our disruptive AI inference products and services. AMD’s ambitions and priorities chime with our own vision of the market and makes Lynn’s appointment with her background at AMD invaluable.”

During her 25-year tenure, Lynn Comp has held a number of key positions, including Vice President of the Data Center Group and General Manager of the Visual Cloud Division in Intel as well as Corporate VP of cloud business unit and Server BU marketing in AMD. She has been instrumental in helping Intel build and foster strong relationships with top-tier customers and partners, and has taken an active role in promoting the company’s environmental and social responsibilities. Lynn was also recognized as part of the “2020 Women of Technology” by Connected World.

NeuReality raises $35M to bring its inferencing chip to the market

NeuReality, an AI hardware startup which specializes in the next generation of AI inferencing platforms, is announcing a $35M Series A funding round. The round was led by Samsung Ventures, Cardumen Capital, Varana Capital, OurCrowd and XT Hitech. SK Hynix, Cleveland Avenue, Korean Investment Partners, StoneBridge, and Glory Ventures also participated in the round. The round brings NeuReality’s total funding to $48 million

The new fundraising will support NeuReality’s plans to start deploying its Inference solutions in 2023. NeuReality’s system is built for optimized deployment in data centers and near-edge on-premises locations in need for higher performance, lower latency, and much higher efficiency compared to existing technologies. The company had already reported its close collaboration with leading AI ecosystem partners and customers such as IBM, AMD, and Lenovo.

NeuReality makes AI easier to deploy in terms of usability, cost, scalability, and sustainability. The company uses a new type of Network Addressable Processing Units (NAPU) optimized for deep learning inference use cases such as computer vision, natural language processing, and recommendation engines. The NAPU

will bring AI for a broader set of less technical companies. It is also set to allow large-scale users such as Hyperscalers and next-wave datacenter customers to support their growing scale of AI usage. NeuReality’s disruptive system approach includes hardware, software and tools that work together to boost efficiency and simplify the adoption and deployment of AI in a wider range of real-world applications.

The funding will allow NeuReality to bring its first-in-class NAPU, the NR1, to the global market. This next-generation AI chip, which is based on NeuReality’s AI-centric architecture, is pushing the performance and efficiency of AI systems to the next level. It does so by removing existing system bottlenecks to increase the utilization of current deep-learning processors, while lowering the latency of AI operations, and saving in overall system cost and power consumption, which are so critical for an improved total cost of ownership (TCO) of data-centers and on-premises large-scale compute systems, and vital for the business models of many applications. NeuReality also pushes the bar higher on power consumption of those power-hungry applications, a stepping stone for the sustainability requirements driven by governments around the globe.

“This investment is another sign of confidence in the talent and innovation that NeuReality and the Israeli tech industry offer the world,” stated Moshe Tanach, CEO and co-founder of NeuReality. “The high-profile investors in our Series A fundraising prove that NeuReality’s value proposition, architecture and flagship product are a viable reality which will transform the AI market.”

Yoav Sebba, Managing Director at XT Hi-Tech, added: “We are honored to join the NeuReality journey led by this exceptional team, headed by Moshe, Tzvika, Yossi and Lior. High performance and sustainable Inference computing is so critical for growth in day-to-day usage of AI. We are seeing endless AI opportunities developed by software companies, but the existing hardware infrastructure is limiting the deployment of those use cases. Recently new deep learning processors had been developed by multiple companies, serving as new type of “brains”. NeuReality is the full body surrounding those brains and allowing it to serve intelligent voice, language, vision, and recommendation applications”.

Dr. Mingu Lee, Managing Partner at Cleveland Avenue Technology Investments, noted: “NeuReality is bringing ease of use and scalability into the deployment of AI inference solutions, and we see great synergy between their promising technology Fortune 500 enterprises companies we communicate with. We feel that investing in companies such as NeuReality is vital, not only to ensure the future of technology, but also in terms of sustainability.”

NeuReality, which raised its seed round early 2020, is led by a seasoned management team with extensive experience in AI, data-center architecture, system, and software. NeuReality was co-founded by CEO Moshe Tanach, formerly Director of Engineering at Marvell and Intel and AVP R&D at DesignArt-Networks (acquired by Qualcomm); VP Operations Tzvika Shmueli, formerly VP of Backend at Mellanox Technologies and VP of Engineering at Habana Labs; and VP VLSI Yossi Kasus, formerly Senior Director of Engineering at Mellanox and the head of VLSI at EZChip. The company’s leading team also includes CTO Lior Khermosh, former co-founder and Chief Scientist of ParallelM and a fellow at PMC Sierra.

Samsung Ventures invests in Israeli AI company NeuReality

NeuReality, an Israeli AI systems and semiconductor company, announced that Samsung Ventures has made an investment in the company. NeuReality makes inference technologies, such as computer vision, natural language processing, and recommendation engines easier to implement for a broader set of less technical companies.

More than just a chip company, NeuReality’s comprehensive solution includes hardware, software and tools that work together to simplify and accelerate AI deployment. The company currently employs more than 30 employees and plans to double its size and recruit talent in VLSI chip design, AI, software, and hardware.

Ori Kirshner, head of Samsung Ventures in Israel, stated: “We see substantial and immediate need for higher efficiency and easy-to-deploy inference solutions for data centers and on-premises use cases, and this is why we are investing in NeuReality. The company’s innovative disaggregation, data movement and processing technologies improve computation flows, compute-storage flows, and in-storage compute – all of which are critical for the ability to adopt and grow AI solutions. Samsung Ventures is committed to invest in companies with strong synergy to Samsung’s core business and NeuReality fits well into this commitment.”

The adoption and growth of AI solutions face various obstacles, which prevent retail, manufacturing, healthcare, and other sectors from deploying AI inference capabilities into business workflows. While the company is new, NeuReality’s team draws from decades of experience in AI, data center systems, hardware design, and software development. As a result, NeuReality uses a system level approach that combines easy-to-use software with high efficiency deep learning and data handling acceleration hardware. This holistic approach dramatically simplifies and accelerates the adoption and mass deployment of inference technology.

Focusing on the growth of real-life AI applications, NeuReality’s solutions are purpose built for a wide variety of sectors including public safety, e-commerce, social networks, medical and healthcare, digital personal assistants, and more. NR1 solution targets cloud and enterprise datacenters, alongside carriers, telecom operators and other near edge compute solutions.

NeuReality emerged out of stealth last year with $8 million seed funding from Cardumen Capital, OurCrowd and Varana Capital. In November 2021, NeuReality signed an agreement with IBM to develop the next generation of high-performance AI inference platforms that will deliver disruptive cost and power consumption improvements for deep learning use cases. NeuReality is also collaborating with AMD to deliver its first-generation AI-centric FPGA based platforms for Inference acceleration to customers.

NeuReality is creating purpose-built AI-platforms for ultra-scalability of real-life AI applications and positioned itself as a pioneer in the deep learning and AI solutions market. The company’s NR1 is the company’s next generation integrated circuit device that is based on its AI-centric architecture. The SoC improves the utilization of AI compute resources that are currently deployed by removing the existing system bottlenecks, lowering the latency of AI operations and saving in overall system cost and power consumption. The company also develops complementary software tools and runtime libraries that make it easy for customers in various skill levels and various deployment topologies to adopt new AI based services in their business workflows.

Moshe Tanach, CEO and co-founder of NeuReality, stated: “The investment from Samsung Ventures is a big vote of confidence in NeuReality’s technology. The funds will help us take the company to the next level and take our NR1 SoC to production. This will enable our customers to evolve their system architecture, and this evolution will make it easier for them to scale and maintain their AI infrastructure, whether it is in their data center, in a cloud or on-premises.”

NeuReality was founded in 2019 and is led by a seasoned management team with extensive experience in AI, data-center architecture, system, and software. NeuReality was co-founded by CEO Moshe Tanach, formerly Director of Engineering at Marvell and Intel and AVP R&D at DesignArt-Networks (later acquired by Qualcomm); VP Operations Tzvika Shmueli, formerly VP of Backend at Mellanox Technologies and VP of Engineering at Habana Labs; and VP VLSI Yossi Kasus, formerly Senior Director of Engineering at Mellanox and the head of VLSI at EZChip. The company’s leading team also includes CTO Lior Khermosh, former co-founder and Chief Scientist of ParallelM and a fellow at PMC Sierra.

IBM and NeuReality team up to build AI Server-on-a-Chip

IBM and NeuReality, an Israeli AI systems and semiconductor company, have signed an agreement to develop the next generation of high-performance AI inference platforms that will deliver disruptive cost and power consumption improvements for deep learning use cases. IBM and NeuReality will enable critical sectors such as finance, insurance, healthcare, manufacturing, and smart cities to deploy computer vision, Natural Language Processing, recommendation systems, and other AI use cases. The collaboration is also aimed at accelerating deployments in today’s ever-growing AI use cases which are already deployed in public and private cloud datacenters

The agreement involves NR1, NeuReality’s first Server-on-a-Chip ASIC implementation of their revolutionary AI-centric architecture. NR1 is based on NeuReality’s first generation FPGA-based NR1-P prototype platform that was introduced earlier this year. The NR1 will be a new type of integrated circuit device with native AI-over-Fabric networking, full AI pipeline offload and hardware-based AI hypervisor capabilities. These capabilities remove the system bottlenecks of today’s solutions and provide disruptive cost and power consumption benefits for inference systems and services. The NR1-P platform will support software integration and system level validation prior to the availability of the NR1 production platform next year.

This partnership also marks NeuReality as the first start-up semiconductor product member of the IBM Research AI Hardware Center and licensee of the Center’s low-precision high performance Digital AI Cores. As part of the agreement, IBM becomes a design partner of NeuReality and will work on the product requirements for the NR1 chip, system, and SDK, that will be implemented in the next revision of the architecture. Together the two companies will evaluate NeuReality’s products for use in IBM’s Hybrid Cloud, including AI use cases, system flows, virtualization, networking, security and more.

The agreement with IBM marks the continued momentum for NeuReality. In February this year, the company emerged from stealth announcing its first-of-a-kind AI-centric architecture and laying out its roadmap, where NR1-P will be followed by NR1. Shortly after, in September, NeuReality announced that it is collaborating with Xilinx to deliver their new AI-centric FPGA based NR1-P platforms to market.

Moshe Tanach, CEO and co-founder of NeuReality, stated: “We are excited and deeply satisfied that a world-class multinational innovator like IBM is partnering with us. We believe our collaboration is a vote of confidence for our AI-centric technology and architecture and in its potential to power real life AI use cases with unprecedented deep learning capabilities.” Tanach added: “Having the NR1-P FPGA platform available today allows us to develop IBM’s requirements and test them before the NR1 Server-on-a-Chip’s tapeout. Being able to develop, test and optimize complex datacenter distributed features, such as Kubernetes, networking, and security before production is the only way to deliver high quality to our customers. I am extremely proud of our engineering team who will deliver a new reality to datacenters and near edge solutions. This new reality will allow many new sectors to deploy AI use cases more effciently than ever before.”

Dr. Mukesh Khare, Vice President of Hybrid Cloud research at IBM Research, said: “In light of IBM’s vision to deliver the most advanced Hybrid Cloud and AI systems and services to our clients, teaming up with NeuReality, which brings a disruptive AI-centric approach to the table, is the type of industry collaboration we are looking for. The partnership with NeuReality is expected to drive a more streamlined and accessible AI infrastructure, which has the potential to enhance people’s lives.”

NeuReality collaborates with Xilinx to deliver first AI-centric server

NeuReality, an Israeli AI startup developing high performance AI compute platforms for cloud data centers and edge nodes, is collaborating with Xilinx, Inc. to deliver new AI-centric platforms that empower, optimize, and tune real-world AI applications. The collaboration is based on NeuReality’s novel AI-centric inference platform NR1-P, which includes a new type of AI Server-on-Chip (SoC) developed by NeuReality and delivers all components necessary to deploy a complete inference solution. The platform targets high volume AI applications in various fields such as public safety, e-commerce, healthcare, retail, and many other computer-vision use cases.

NeuReality has worked closely with Xilinx to deliver the world’s first fully functional AI-centric server to the market. This breakthrough prototype platform will reduce the two key barriers that inhibit customers’ AI deployment today, cost and complexity. As part of the collaboration, the NR1-P platform, based on the Xilinx Versal ACAP architecture, can be purchased directly from NeuReality, and will be fulfilled through Colfax International.

NR1-P is NeuReality’s first implementation of the company’s new AI-centric architecture, with other implementations to follow. The new prototype platform is accessible for testing and evaluation via remote access. NeuReality’s NR1-P was built upon the Xilinx Versal VCK5000 development card and can deliver up to 3X greater performance/Dollar/Watt compared to the latest Nvidia A100 or T4 based systems, according to NeuReality. The complete prototype solution includes a 4xRU server chassis, 16 AI-centric modules based on the Xilinx development cards, an embedded Linux software stack with Kubernetes support, orchestration functionality and a model database.

Moshe Tanach, CEO and co-founder of NeuReality, noted: “Working closely with Xilinx, the market leader of FPGAs for AI, has taken us one step closer to a new reality of AI-centric Server-on-Chip silicon devices that deliver best in class TCO, linear scalability, reduced latency and a simple user interface and experience. These can enable use cases such as healthcare, public safety and other applications that depend on higher efficiency and much lower cost solutions compared to the existing CPU-centric offerings from companies such as Nvidia.”

NeuReality Ltd. is an AI technology innovation company creating purpose-built AI-platforms for ultra-scalability of real-life AI applications. NeuReality is a pioneer in the deep learning and AI solutions market.

NeuReality was founded in 2019 and is led by a seasoned management team with extensive experience in data centers architecture, system, and software. The co-founders are CEO Moshe Tanach, VP Operations Tzvika Shmueli and VP VLSI Yossi Kasus. Prior to founding NeuReality, Tanach served in several executive roles as Director of Engineering at Marvell and Intel and AVP R&D at DesignArt-Networks (later acquired by Qualcomm). Tzvika Shmueli served as VP of Backend at Mellanox Technologies and VP of Engineering at Habana Labs. Yossi Kasus served as Senior Director of Engineering at Mellanox and the head of VLSI at EZChip.