Deci launches new models for enhancing Deep Learning on CPU 

Photo above:Deci’s founders (from left to right): Jonathan Elial- COO, Yonatan Giefman- CEO and Ran El Yaniv- Chief scientist. Credit: Deci

Deci, the deep learning company harnessing Artificial Intelligence (AI) to build AI, announced a new set of image classification models, dubbed DeciNets, for Intel Cascade Lake CPUs. According to Deci, its proprietary Automated Neural Architecture Construction (AutoNAC) technology automatically generated the new image classification models that significantly improve all published models and deliver more than 2x improvement in runtime, coupled with improved accuracy, as compared to the most powerful models publicly available such as EfficientNets, developed by Google.

While GPUs have traditionally been the hardware of choice for running convolutional neural networks (CNNs), CPUs, already more commonly utilized for various computing tasks, would serve as a much cheaper alternative. Although it is possible to run deep learning inference on CPUs, generally they are significantly less powerful than GPUs. Consequently, deep learning models typically perform 3-10X slower on a CPU than on a GPU.

As explained by Deci, its DeciNets closes the gap significantly between GPU and CPU performance for CNNs. With DeciNets, tasks that previously could not be carried out on a CPU because they were too resource intensive are now possible. Additionally, these tasks will see a marked performance improvement: by leveraging DeciNets, the gap between a model’s inference performance on a GPU versus a CPU is cut in half, without sacrificing the model’s accuracy.

“As deep learning practitioners, our goal is not only to find the most accurate models, but  to uncover the most resource-efficient models which work seamlessly in production – this combination of effectiveness and accuracy constitutes the ‘holy grail’ of deep learning,” said Yonatan Geifman, co-founder and CEO of Deci. “AutoNAC creates the best computer vision models to date, and now, the new class of DeciNets can be applied and effectively run AI applications on CPUs.”

All networks were compiled and quantized using OpenVino, with latency measured on AWS instance c5.4xlarge with Cascade Lake CPU (16 vCPUs, batch size = 1)

“There is a commercial, as well as academic desire, to tackle increasingly difficult AI challenges. The result is a rapid increase in the complexity and size of deep neural models that are capable of handling those challenges,” said Prof. Ran El-Yaniv, co-founder and Chief Scientist of Deci and Professor of Computer Science at the Technion – Israel Institute of Technology. The hardware industry is in a race to develop dedicated AI chips that will provide sufficient compute to run such models; however, with model complexity increasing at a staggering pace, we are approaching the limit of what hardware can support using current chip technology. Deci’s AutoNAC creates powerful models automatically, giving users superior accuracy and inference speed even on low-cost devices, including  traditional CPUs.”

In March 2021, Deci and Intel announced a broad strategic collaboration to optimize deep learning inference on Intel Architecture (IA) CPUs. Prior to this, Deci and Intel worked together at MLPerf, where on several popular Intel CPUs, Deci’s AutoNAC technology accelerated the inference speed of the well-known ResNet50 neural network, reducing the submitted models’ latency by a factor of up to 11.8x and increasing throughput by up to 11x.

Deci enables deep learning to live up to its true potential by using AI to build better AI. With the company’s end-to-end deep learning development platform, AI developers can build, optimize, and deploy faster and more accurate models for any environment including cloud, edge, and mobile, allowing them to revolutionize industries with innovative products.  The platform is powered by Deci’s proprietary automated Neural Architecture Construction technology (AutoNAC), which automatically generates and optimizes deep learning models’ architecture and allows teams to accelerate inference performance, enable new use cases on limited hardware, shorten development cycles and reduce computing costs. Founded by Yonatan Geifman, Jonathan Elial, and Professor Ran El-Yaniv, Deci’s team of deep learning engineers and scientists are dedicated to eliminating production-related bottlenecks across the AI lifecycle.

“Data Centers to Adopt High Voltage DC Power Sources”

Above: IBM’s Blue Gene computer paved the way for powerful AI Data Centers

“The large data centers are undergoing a deep structural change. We anticipate more datacenters moving away from Alternating Current (AC) in favor of 260-410V DC infrastructures to better cope with the massive increases in power needs of high-performance computing,” told Lev Slutskiy, Vicor’s EMEA Business Development Manager for High Performance Computing. “Google started testing the concept secretly back in 2015 and today companies like Nvidia are performing experiments with high voltage, that haven’t been published yet.”

According to Slutskiy, the Open Compute Project Foundation is also testing the new approach. The OCP was established 10 years ago by Facebook and now it brings together the biggest manufactureres of processors, servers and data center infrastructures. They are tackling an old electrical dilemma: Since the electric power is a multiplication of the voltage by current, using a high direct voltage at low currents saves a lot of energy (P = I²R). Until recently the problem was marginal: standard database servers consumed approximately 5kW each – and power systems that passed energy into the server circuits at a voltage of 12V and 416 Amps current – were good enough.

What can be done with 1,000 Amps

But times change and around 2015 the average power consumption of database servers increased to 12kW, with currents ranging up to 1k Amps. Most of the manufacturers dealt with the high currents using very large conduction cables, but this solution is beginning to reach the end of its ability. Especially in the last year in which the growing use of artificial intelligence and machine learning multiplied the power usage of the database servers: Vicor reports that in the large data centers the usage increased to about 20kW, and in some cases even to 100kW.

This means that the power distribution systems need to deal with huge currents of appoximately 1k Amps. At this point the OCP consortium started to define a format of database servers working at higher voltages of 48V. This decreases the current in the circuit by 4 and minimizes the power loss in the conduction cables by 16. Thus for instance, the current required for 12kW server will be only 250 Amps.

Vicor’s approach for direct power supply to the processors in the data center
Vicor’s approach for direct power supply to the processors in the data center

According to Jain Ajithkumar, Vicor’s Sr. Director for Strategic Accounts in Data Center, HPC and AI Business, this is just the first move in a larger trend and it holds further technological implications. “We are now at the beginning of a new era. The computer rooms will receive direct voltage of 350V, that will be converted to 48V at the racks level, and then to the exact voltage needed by each specific chip in the server.

“We are dealing here with two additional issues: today the processors work at 1.8V and 0.8V. When we minimize the width of the transistor to 5 nanometers, the voltage may reach down to 0.4V. There is a need for an advanced technology to answer this need – and to do it without compromising the space dedicated to the multitude of densed data links populating modern processors.”

Looking for the next Startup

This is where Vicor’s Factorized Power Architecture technology, originally developed for IBM’s 2007 Blue Gene supercomputer, comes into the picture. Blue Gene was powered by 350V, delivered through Vicor’s power distribution system, starting with high power rails, middle stages and ending with a dedicated chip directly connected to the CPU’s power connections. “We have developed a technique for pushing the power supply to each processor of Blue Gene.

“Now we are making that technology available for startup companies too, and therefore it is very important for us to exist in the Israeli Market. Out of all the influential startup companies in the world, just 5% come from Europe. Thus Israel’s foothold in the European market is very large. Today there are over 300 HPC startup companies in Israel, and most of the startups that are being sold to global companies are Israeli. We are offering to supply them with chips and full planning methodologies – including converting supply networks to voltages of hundreds of volts, bringing energy to the consumption point, lowering to a voltage of 48 volts and supply the required voltage for each processor.”

x86 continues to be the mainstream for server CPUs

Despite ARM and other newcomers, x86 continues to be the mainstream architecture for server CPUs this year, with Intel and AMD being the market leaders. According to a new study made by DRAMeXchange, a division of TrendForce, Intel represents around 98% of the total server CPU shipments worldwide in 2018. For the next year, the market share of AMD x86 server CPUs is expected to go up to 5%, after the company’s introduction of 7nm CPUs to the market.

“Intel will remain the leader in server CPU market”, says Mark Liu, senior analyst from DRAMeXchange. “However, competitions will come from AMD, who is on pace to migrate to more advanced processes and to offer solutions with better performances and lower power consumption.” AMD’s solutions have been adopted by a small scale of cloud service providers like Baidu, Ali Cloud and AWS. “AMD may have a chance to scale up in volumes after 7nm products are launched in the future.”

AMD 7nm solutions may be released in 2H19

The survey finds that the market penetration rate of product lines based on Intel Purley Gen 1 (Sky Lake) has already reached 60% and is estimated to reach 65% at the end of 2018. In terms of product plan for the next year, Purley Gen 2 Cascade Lake will be still produced on 14nm process, the same as Gen 1. The new solution will not become the market mainstream until 2H19.

Around 70% of AMD’s product lines have been transferred to new EPYC systems this year, while the company’s Naples solutions have migrated from 28/32nm to 14nm process, with computing performance improved significantly. However, AMD takes only about 2% in the current server CPU market. This is because AMD provides mainly 1-socket solutions, and the limited offerings have constrained the company’s market expansion compared with Intel.

AMD’s new solution, Rome, will migrate to the 7nm process. The company will also switch from GlobalFoundries to TSMC for future collaboration of product development on 7nm process. Previous problems of high power consumption will also be corrected. The Rome platform server processors are expected to enter production in 1Q19 and have a chance to come onto the market in the second half of next year.

DRAMeXchange notes that the penetration of new platforms may drive the average content per box in servers. In 2019, the average density of DRAM in a server will see an annual growth of 25% YoY, significantly lower than almost 40% growth in 2018