{"id":14933,"date":"2025-03-16T04:14:27","date_gmt":"2025-03-16T04:14:27","guid":{"rendered":"https:\/\/dmsretail.com\/RetailNews\/cisco-it-deploys-ai-ready-data-center-in-weeks-while-scaling-for-the-future\/"},"modified":"2025-03-16T04:14:27","modified_gmt":"2025-03-16T04:14:27","slug":"cisco-it-deploys-ai-ready-data-center-in-weeks-while-scaling-for-the-future","status":"publish","type":"post","link":"https:\/\/dmsretail.com\/RetailNews\/cisco-it-deploys-ai-ready-data-center-in-weeks-while-scaling-for-the-future\/","title":{"rendered":"Cisco IT deploys AI-ready data center in weeks, while scaling for the future"},"content":{"rendered":"<p> <p><a href=\"https:\/\/dmsretail.com\/online-workshops-list\/\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-496\" src=\"https:\/\/dmsretail.com\/RetailNews\/wp-content\/uploads\/2022\/05\/RETAIL-ONLINE-TRAINING-728-X-90.png\" alt=\"Retail Online Training\" width=\"729\" height=\"91\" srcset=\"https:\/\/dmsretail.com\/RetailNews\/wp-content\/uploads\/2022\/05\/RETAIL-ONLINE-TRAINING-728-X-90.png 729w, https:\/\/dmsretail.com\/RetailNews\/wp-content\/uploads\/2022\/05\/RETAIL-ONLINE-TRAINING-728-X-90-300x37.png 300w\" sizes=\"auto, (max-width: 729px) 100vw, 729px\" \/><\/a><\/p><br \/>\n<\/p>\n<div>\n<p><i><span data-contrast=\"auto\">Cisco IT designed AI-ready infrastructure with Cisco compute, best-in-class NVIDIA GPUs, and Cisco networking that supports AI model training and inferencing across dozens of use cases for Cisco product and engineering teams.<\/span><\/i><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">It\u2019s no secret that the pressure to implement AI across the business presents challenges for IT teams. It challenges us to deploy new technology faster than ever before and rethink how data centers are built to meet increasing demands across compute, networking, and storage. While the pace of innovation and business advancement is exhilarating, it can also feel daunting.\u00a0<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">How do you quickly build the data center infrastructure needed to power AI workloads and keep up with critical business needs? This is exactly what our team, Cisco IT, was facing.<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<h2><strong>The ask from the business<\/strong><\/h2>\n<p><span class=\"TextRun SCXW34251422 BCX4\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW34251422 BCX4\">We were approached by a product team <\/span><span class=\"NormalTextRun SCXW34251422 BCX4\">that<\/span><span class=\"NormalTextRun SCXW34251422 BCX4\"> needed a way to run AI workloads <\/span><span class=\"NormalTextRun SCXW34251422 BCX4\">which<\/span><span class=\"NormalTextRun SCXW34251422 BCX4\"> would be used to develop and test new AI capabilities for Cisco products<\/span><span class=\"NormalTextRun SCXW34251422 BCX4\">.<\/span> <span class=\"NormalTextRun SCXW34251422 BCX4\">It <\/span><span class=\"NormalTextRun SCXW34251422 BCX4\">would eventually support model training and inferencing for multiple teams and doze<\/span><span class=\"NormalTextRun SCXW34251422 BCX4\">n<\/span><span class=\"NormalTextRun SCXW34251422 BCX4\">s of use cases across the business. <\/span><span class=\"NormalTextRun SCXW34251422 BCX4\">And<\/span> <span class=\"NormalTextRun SCXW34251422 BCX4\">they needed it done quickly<\/span><span class=\"NormalTextRun SCXW34251422 BCX4\">.<\/span> <span class=\"NormalTextRun SCXW34251422 BCX4\"> need <\/span><span class=\"NormalTextRun SCXW34251422 BCX4\">for the product teams <\/span><span class=\"NormalTextRun SCXW34251422 BCX4\">to get innovations to our customers as quickly as <\/span><span class=\"NormalTextRun SCXW34251422 BCX4\">possible, we<\/span><span class=\"NormalTextRun SCXW34251422 BCX4\"> had to <\/span><span class=\"NormalTextRun SCXW34251422 BCX4\">deliver<\/span><span class=\"NormalTextRun SCXW34251422 BCX4\"> the <\/span><span class=\"NormalTextRun SCXW34251422 BCX4\">new environment<\/span><span class=\"NormalTextRun SCXW34251422 BCX4\"> in just three months.\u00a0<\/span><\/span><span class=\"EOP SCXW34251422 BCX4\" data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<h2><strong>The technology requirements<\/strong><\/h2>\n<p><span data-contrast=\"auto\">We began by mapping out the requirements for the new AI infrastructure. A non-blocking, lossless network was essential with the AI compute fabric to ensure reliable, predictable, and high-performance data transmission within the AI cluster. <\/span><span data-contrast=\"none\">Ethernet<\/span><span data-contrast=\"auto\"> was the first-class choice. Other requirements included:<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<ul>\n<li><b><span data-contrast=\"none\">Intelligent buffering, low latency: <\/span><\/b><span data-contrast=\"none\">Like any good data center, these are essential for maintaining smooth data flow and minimizing delays, as well as enhancing the responsiveness of the AI fabric.<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/li>\n<li><b><span data-contrast=\"none\">Dynamic congestion avoidance for various workloads: <\/span><\/b><span data-contrast=\"none\">AI workloads can vary significantly in their demands on network and compute resources. Dynamic congestion avoidance would ensure that resources were allocated efficiently, prevent performance degradation during peak usage, maintain consistent service levels, and prevent bottlenecks that could disrupt operations.<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/li>\n<li><b><span data-contrast=\"none\">Dedicated front-end and back-end networks, non-blocking fabric: <\/span><\/b><span data-contrast=\"none\">With a goal to build scalable infrastructure, a <\/span><span data-contrast=\"none\">non-blocking fabric<\/span><span data-contrast=\"none\"> would ensure sufficient bandwidth for data to flow freely, as well as enable a high-speed data transfer \u2014 which is crucial for handling large data volumes typical with AI applications. By segregating our front-end and back-end networks, we could enhance security, performance, and reliability.<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/li>\n<li><b><span data-contrast=\"none\">Automation for Day 0 to Day 2 operations: <\/span><\/b><span data-contrast=\"none\">From the day we deployed, configured, and tackled ongoing management, we had to reduce any manual intervention to keep processes quick and minimize human error.<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/li>\n<li><b><span data-contrast=\"none\">Telemetry and visibility: <\/span><\/b><span data-contrast=\"none\">Together, these capabilities would provide insights into system performance and health, which would allow for proactive management and troubleshooting<\/span><b><span data-contrast=\"none\">.<\/span><\/b><span data-ccp-props=\"{}\">\u00a0<\/span><\/li>\n<\/ul>\n<h2><strong>The plan \u2013 with a few challenges to overcome<\/strong><\/h2>\n<p><span data-contrast=\"auto\">With the requirements in place, we began figuring out where the cluster could be built. <\/span><span data-contrast=\"none\">The existing data center facilities were not designed to support AI workloads. We knew that building from scratch with a full data center refresh would take 18-24 months \u2013 which was not an option. We needed to deliver an operational AI infrastructure in a matter of weeks, so we leveraged an existing facility with minor changes to cabling and device distribution to accommodate.\u00a0<\/span><\/p>\n<p>Our next concerns were around the data being used to train models. Since some of that data would not be stored locally in the same facility as our AI infrastructure, we decided to replicate data from other data centers into our AI infrastructure storage systems to avoid performance issues related to network latency. Our network team had to ensure sufficient network capacity to handle this data replication into the AI infrastructure.<\/p>\n<p><span data-contrast=\"none\">Now, getting to the actual infrastructure. <\/span><span data-contrast=\"auto\">We designed the heart of the AI infrastructure with Cisco compute, best-in-class GPUs from NVIDIA, and Cisco networking. On the networking side, we built a front-end ethernet network and back-end lossless ethernet network. With this model, we were confident that we could quickly deploy advanced AI capabilities in any environment and continue to add them as we brought more facilities online.<\/span><\/p>\n<h2><strong>Products:\u00a0<\/strong><\/h2>\n<h2><strong>Supporting a growing environment<\/strong><\/h2>\n<p><span data-contrast=\"none\">After making the initial infrastructure available, the business added more use cases each week and we added additional AI clusters to support them. We needed a way to make it all easier to manage, including managing the switch configurations and monitoring for packet loss. We used Cisco Nexus Dashboard, which dramatically streamlined operations and ensured we could grow and scale for the future. We were already using it in other parts of our data center operations, so it was easy to extend it to our AI infrastructure and didn\u2019t require the team to learn an additional tool.<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<h2><strong>The results<\/strong><\/h2>\n<p><span data-contrast=\"none\">Our team was able to move fast and overcome several hurdles in designing the solution. We were able to design and deploy the backend of the AI fabric in under three hours and deploy the entire AI cluster and fabrics in 3 months, which was 80% faster than the alternative rebuild. <\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"none\">Today, the environment supports more than 25 use cases across the business, with more added each week. This includes:<\/span><\/p>\n<ul>\n<li><span data-ccp-props=\"{}\">Webex Audio: Improving codec development for noise cancellation and lower bandwidth data prediction<\/span><\/li>\n<li><span data-ccp-props=\"{}\">Webex Video: Model training for background replacement, gesture recognition, and face landmarks<\/span><\/li>\n<li><span data-ccp-props=\"{}\">Custom LLM training for cybersecurity products and capabilities <\/span><\/li>\n<\/ul>\n<p><span data-ccp-props=\"{}\"><span class=\"TextRun SCXW124278832 BCX4\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW124278832 BCX4\">Not only were we able to support the needs of the business today, but <\/span><span class=\"NormalTextRun SCXW124278832 BCX4\">we\u2019re<\/span><span class=\"NormalTextRun SCXW124278832 BCX4\"> designing how our data centers need to evolve for the future. We are actively building out more clusters and will share additional details on our journey in future blogs.<\/span><\/span><\/span><span style=\"font-family: Helvetica; font-size: 9.0pt;\">\u00a0<\/span><span data-ccp-props=\"{}\"><span class=\"TextRun SCXW124278832 BCX4\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW124278832 BCX4\">The modularity and flexibility of Cisco\u2019s networking, <\/span><span class=\"NormalTextRun SCXW124278832 BCX4\">compute<\/span><span class=\"NormalTextRun SCXW124278832 BCX4\">, and security gives us confidence that we can keep scaling with the business.<\/span><\/span><span class=\"EOP SCXW124278832 BCX4\" data-ccp-props=\"{}\">\u00a0<\/span><\/span><\/p>\n<p>\u00a0<\/p>\n<hr\/>\n<p><strong>Additional resources:<\/strong><\/p>\n<p>Share:<\/p>\n<p>\n  \t<\/div>\n<p><p><a href=\"https:\/\/dmsretail.com\/online-workshops-list\/\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-496\" src=\"https:\/\/dmsretail.com\/RetailNews\/wp-content\/uploads\/2022\/05\/RETAIL-ONLINE-TRAINING-728-X-90.png\" alt=\"Retail Online Training\" width=\"729\" height=\"91\" srcset=\"https:\/\/dmsretail.com\/RetailNews\/wp-content\/uploads\/2022\/05\/RETAIL-ONLINE-TRAINING-728-X-90.png 729w, https:\/\/dmsretail.com\/RetailNews\/wp-content\/uploads\/2022\/05\/RETAIL-ONLINE-TRAINING-728-X-90-300x37.png 300w\" sizes=\"auto, (max-width: 729px) 100vw, 729px\" \/><\/a><\/p><br \/><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Cisco IT designed AI-ready infrastructure with Cisco compute, best-in-class NVIDIA GPUs, and Cisco networking that supports AI model training and inferencing across dozens of use [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":14934,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[5],"tags":[],"class_list":["post-14933","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technology"],"_links":{"self":[{"href":"https:\/\/dmsretail.com\/RetailNews\/wp-json\/wp\/v2\/posts\/14933","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dmsretail.com\/RetailNews\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dmsretail.com\/RetailNews\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dmsretail.com\/RetailNews\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/dmsretail.com\/RetailNews\/wp-json\/wp\/v2\/comments?post=14933"}],"version-history":[{"count":0,"href":"https:\/\/dmsretail.com\/RetailNews\/wp-json\/wp\/v2\/posts\/14933\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/dmsretail.com\/RetailNews\/wp-json\/wp\/v2\/media\/14934"}],"wp:attachment":[{"href":"https:\/\/dmsretail.com\/RetailNews\/wp-json\/wp\/v2\/media?parent=14933"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dmsretail.com\/RetailNews\/wp-json\/wp\/v2\/categories?post=14933"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dmsretail.com\/RetailNews\/wp-json\/wp\/v2\/tags?post=14933"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}