{"id":16902,"date":"2026-03-25T14:08:26","date_gmt":"2026-03-25T14:08:26","guid":{"rendered":"https:\/\/dmsretail.com\/RetailNews\/fine-tuning-embedding-models-for-enterprise-retrieval-a-practical-guide-with-nvidia-nemotron-recipe\/"},"modified":"2026-03-25T14:08:26","modified_gmt":"2026-03-25T14:08:26","slug":"fine-tuning-embedding-models-for-enterprise-retrieval-a-practical-guide-with-nvidia-nemotron-recipe","status":"publish","type":"post","link":"https:\/\/dmsretail.com\/RetailNews\/fine-tuning-embedding-models-for-enterprise-retrieval-a-practical-guide-with-nvidia-nemotron-recipe\/","title":{"rendered":"Fine-Tuning Embedding Models for Enterprise Retrieval: A Practical Guide with NVIDIA Nemotron Recipe"},"content":{"rendered":"<p> <p><a href=\"https:\/\/dmsretail.com\/online-workshops-list\/\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-496\" src=\"https:\/\/dmsretail.com\/RetailNews\/wp-content\/uploads\/2022\/05\/RETAIL-ONLINE-TRAINING-728-X-90.png\" alt=\"Retail Online Training\" width=\"729\" height=\"91\" srcset=\"https:\/\/dmsretail.com\/RetailNews\/wp-content\/uploads\/2022\/05\/RETAIL-ONLINE-TRAINING-728-X-90.png 729w, https:\/\/dmsretail.com\/RetailNews\/wp-content\/uploads\/2022\/05\/RETAIL-ONLINE-TRAINING-728-X-90-300x37.png 300w\" sizes=\"auto, (max-width: 729px) 100vw, 729px\" \/><\/a><\/p><br \/>\n<\/p>\n<div>\n<p data-ttstextid=\"4\"><em>This blog is jointly written by Md Rahman, Arkaprabho Ghosh, Navin Bilwar, and Desh Shukla.<\/em><\/p>\n<h2>Executive summary<\/h2>\n<p>Cisco IT recently evaluated fine-tuning embedding models using NVIDIA Nemotron RAG fine-tuning recipe as part of an effort to improve retrieval accuracy for domain-specific enterprise data. The objective was not to redesign existing retrieval-augmented generation (RAG) systems, but to understand whether targeted embedding fine-tuning could materially improve semantic search quality with reasonable effort and fast turnaround. Through this experiment, Cisco was able to validate firsthand that embedding fine-tuning, combined with synthetic data generation, can deliver measurable accuracy gains within a short time frame. The experiment also demonstrated strong time-to-value, enabling rapid iteration and clear performance signals without long training cycles or extensive manual labeling. The reduced turnaround of only a few days to understand the immediate benefits was a key outcome of this collaboration.<br \/>The embedding model training and evaluation workflow was executed on Cisco AI PODs running Cisco UCS 885A infrastructure powered by NVIDIA HGX platform.<\/p>\n<h2>Problem statement<\/h2>\n<p>Prior to conducting this experiment, Cisco had conducted similar embedding fine-tuning experiments using earlier generation models and smaller scale infrastructure. These prior efforts required significant manual tuning of hyperparameters such as batch size and number of epochs, and results were often difficult to stabilize. Iteration cycles were long, making it challenging to explore different configurations or scale experiments. Despite some localized improvements, keyword search remained necessary for many domain-specific retrieval scenarios. There was also no standardized, end-to-end workflow that engineering teams could execute quickly and evaluate consistently across runs. Often, these efforts would take weeks to months of manual effort for uncertain gains.<\/p>\n<h2>How the fine\u2011tuning went and time to value<\/h2>\n<p>In this experiment, Cisco used the NVIDIA NeMo Retriever embedding finetuning recipe, leveraging synthetic data generation to produce training signals from existing corpora. The recipe runs through five distinct stages: synthetic data generation (SDG), data preparation with hard-negative mining, contrastive fine-tuning, BEIR evaluation, and ONNX model export. The workflow was able to run end-to-end successfully. All experiments ran on a single NVIDIA H200 143 GB GPU hosted within Cisco AI Pods built on Cisco UCS 885A systems. Finetuning runs completed within hours of training time, enabling rapid experimentation across multiple dataset sizes and configurations. The use of synthetic data generation eliminated the need for manual labeling, significantly reducing overhead. This approach allowed Cisco to iterate quickly, observe performance trends early, and validate whether embedding fine-tuning was worth further investment. The overall time-to-value was substantially shorter than previous efforts, with meaningful insights gained after only a small number of runs.<\/p>\n<p>The five-stage pipeline architecture:<\/p>\n<p style=\"text-align: left;\"><img fetchpriority=\"high\" decoding=\"async\" class=\"lazy lazy-hidden aligncenter size-medium_large wp-image-488467\" data-lazy-type=\"image\" src=\"https:\/\/blogs.cisco.com\/gcs\/ciscoblogs\/1\/2026\/03\/image11-768x164.png\" alt=\"\" width=\"768\" height=\"164\"\/><noscript><img fetchpriority=\"high\" decoding=\"async\" class=\"aligncenter size-medium_large wp-image-488467\" src=\"https:\/\/blogs.cisco.com\/gcs\/ciscoblogs\/1\/2026\/03\/image11-768x164.png\" alt=\"\" width=\"768\" height=\"164\"\/><\/noscript><em>Timings based on ~925 documents \/ ~9,200 QA pairs \/ ~7,800 training examples on a single NVIDIA H200 GPU running on Cisco AI Pods with Cisco UCS 885A infrastructure. Actual duration scales with data volume.<\/em><\/p>\n<h2>Accuracy gains observed<\/h2>\n<p>Across multiple experiments, the results showed consistent, measurable improvements. Fine-tuning the NVIDIA 1-billion-parameter NV-EmbedQA model on synthetic domain-specific data yielded gains across all retrieval metrics, with NDCG@1 gains of +7.1 to +7.3 absolute points (+9.9% to +11.1% relative). Recall@10 improved by up to +6.8 points (+8.5%), and MAP@10 by up to +6.5 points (+9.7%). Using an on-premise 120B-parameter LLM for synthetic data generation, the entire pipeline ran with zero external API costs and with the data staying completely on prem ensured data privacy. These gains held even as dataset size increased and retrieval tasks became more challenging. Importantly, improvements were observed on domain-specific queries that previously performed poorly with base embedding models. While these results represent an initial baseline rather than a fully optimized outcome, they provided strong confirmation that embedding fine-tuning can materially improve retrieval quality for enterprise-specific data.<\/p>\n<h3 style=\"text-align: left;\"><strong>\u00a0<\/strong><\/h3>\n<h3><strong>\u00a0 \u00a0Summary of experiments<\/strong><\/h3>\n<p style=\"text-align: left;\"><img loading=\"lazy\" decoding=\"async\" class=\"lazy lazy-hidden aligncenter size-medium_large wp-image-488470\" data-lazy-type=\"image\" src=\"https:\/\/blogs.cisco.com\/gcs\/ciscoblogs\/1\/2026\/03\/qkGx1a86-image21-768x224.png\" alt=\"\" width=\"768\" height=\"224\"\/><noscript><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-medium_large wp-image-488470\" src=\"https:\/\/blogs.cisco.com\/gcs\/ciscoblogs\/1\/2026\/03\/qkGx1a86-image21-768x224.png\" alt=\"\" width=\"768\" height=\"224\"\/><\/noscript><em><strong>Table 1.<\/strong> Retrieval performance comparison between the base embedding model and the contrastively fine-tuned model across two dataset sizes (334 and 925 documents). Fine-tuning consistently improves ranking quality across all BEIR evaluation metrics.<\/em><\/p>\n<h3><strong>\u00a0 \u00a0<\/strong><\/h3>\n<h3><strong>\u00a0 \u00a0Key Observations: <\/strong><\/h3>\n<ul>\n<li>Fine-tuning consistently improved retrieval quality across all metrics.<\/li>\n<li>NDCG@1 showed the largest improvement in top-level relevance.<\/li>\n<li>Gains were stable across the two dataset sizes (334 and 925 documents).<\/li>\n<li>Improved Recall@10 and Map@10 gains indicative of better coverage and ranking than the base embedding model.<\/li>\n<\/ul>\n<h2>What surprised us<\/h2>\n<p>The most unexpected finding was how quickly the recipe delivered actionable results. Within a few days of starting the experiment, we had measurable accuracy improvements \u2014 a stark contrast to previous efforts that took weeks to months. The synthetic data generation approach produced training signals of sufficient quality to drive meaningful gains without a single manually labeled example. We were also surprised by how well the improvements generalized across query types, including the rare-token identifier queries that had historically been the weakest point for semantic search.<\/p>\n<h2>Next steps with engagement<\/h2>\n<p>Building on these results, Cisco <strong>will continue<\/strong> working with NVIDIA to systematically push accuracy further. The next phase of work <strong>will focus<\/strong> on:<\/p>\n<ul>\n<li><strong>Using<\/strong> a fixed evaluation set across runs so that metrics <strong>will be<\/strong> directly comparable<\/li>\n<li><strong>Tuning<\/strong> the learning rate (trying default, half, and double) and <strong>increasing<\/strong> epochs from 3 to 5<\/li>\n<li><strong>Scaling<\/strong> training data to ~100K QA pairs to find the saturation point for the domain<\/li>\n<li><strong>Using<\/strong> a larger or higher-quality LLM for synthetic data generation to improve QA pair fidelity<\/li>\n<li><strong>Applying<\/strong> 10% warmup with cosine decay for more stable convergence<\/li>\n<li><strong>Increasing<\/strong> hard-negative mining from 5 to 10 negatives per query for a stronger contrastive signal<\/li>\n<li><strong>Refining<\/strong> synthetic data generation prompts to better emphasize rare and domain-specific terms \u2014 bug IDs, product identifiers, firmware versions \u2014 where base models struggle most<\/li>\n<li><strong>Exploring<\/strong> chunk-aware training: using real document chunks from a production vector database as the retrieval corpus, generating questions against those chunks via the LLM, and mapping each question to its positive chunk and hard-negative chunks \u2014 training the model on the same data distribution it <strong>will encounter<\/strong> in production, where answers <strong>may be<\/strong> buried in longer text and chunking strategies <strong>will vary<\/strong><\/li>\n<\/ul>\n<p>Longer term, the engagement <strong>will expand<\/strong> to include re-ranker fine-tuning and broader retrieval optimization as part of a full end-to-end RAG improvement effort.<\/p>\n<h2><em><strong>Value of the fine-tuning embedding model<\/strong><\/em><\/h2>\n<p><em>This experiment supports that leveraging a fine-tuning embedding model can accelerate time to production by providing a validated, end-to-end fine-tuning workflow that delivers measurable improvements in days rather than months. The ideas and findings from this work are actively shaping the recipe\u2019s evolution, while Cisco gains early access to a maturing pipeline that shortens the path from experimentation to production. The work also demonstrates how Cisco AI Pods based on Cisco UCS 885A systems and NVIDIA H200 GPUs can provide an effective enterprise infrastructure foundation for rapid embedding model adaptation<\/em><em>.<\/em><\/p>\n<h2>Key fine-tuning embedding model benefits for businesses<\/h2>\n<ul>\n<li>Protect proprietary data (on-premises execution)<\/li>\n<li>Reduce support costs (faster resolution, fewer escalations)<\/li>\n<li>No cloud API dependency (zero external costs)<\/li>\n<li>Fast time-to-value (<em>full end-to-end pipeline \u2014 all 5 stages including SDG, mining, training, evaluation, and export \u2014 completes in 2-5 hours on a single GPU<\/em>)<\/li>\n<\/ul>\n<h2>\u00a0Key fine-tuning embedding model benefits for developers<\/h2>\n<ul>\n<li>No manual annotation required (synthetic data generation)<\/li>\n<li>Modular, hackable architecture (<em>5 distinct stages: SDG \u2192 Data Prep \u2192 Fine-Tune \u2192 Evaluate \u2192 Export<\/em>)<\/li>\n<li>Production-ready outputs (ONNX export)<\/li>\n<li>Built-in evaluation (BEIR \u2014 Benchmarking Information Retrieval \u2014 framework)<\/li>\n<li>Hard negative mining included (automatic quality boost)<\/li>\n<\/ul>\n<h2>Get started<\/h2>\n<p>The fine-tuning recipe for\u00a0Llama Nemotron Embed 1B\u00a0model is available now as a complete, production-ready pipeline. Whether you\u2019re building enterprise search, RAG applications, or domain-specific retrieval systems, this recipe provides a clear path from raw documents to deployed, domain-adapted embeddings.<\/p>\n<p><strong>Ready to fine-tune your own embedding model?<\/strong><\/p>\n<p>\ud83d\udc49\u00a0Explore the Nemotron Embed Fine-Tuning Recipe on GitHub<\/p>\n<h3>From local fine-tuning to secure agent execution, keep sensitive data local and protected\u2014powered by NVIDIA and secured with Cisco AI Defense on AI PODs.<\/h3>\n<\/p><\/div>\n<p><p><a href=\"https:\/\/dmsretail.com\/online-workshops-list\/\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-496\" src=\"https:\/\/dmsretail.com\/RetailNews\/wp-content\/uploads\/2022\/05\/RETAIL-ONLINE-TRAINING-728-X-90.png\" alt=\"Retail Online Training\" width=\"729\" height=\"91\" srcset=\"https:\/\/dmsretail.com\/RetailNews\/wp-content\/uploads\/2022\/05\/RETAIL-ONLINE-TRAINING-728-X-90.png 729w, https:\/\/dmsretail.com\/RetailNews\/wp-content\/uploads\/2022\/05\/RETAIL-ONLINE-TRAINING-728-X-90-300x37.png 300w\" sizes=\"auto, (max-width: 729px) 100vw, 729px\" \/><\/a><\/p><br \/><\/p>\n","protected":false},"excerpt":{"rendered":"<p>This blog is jointly written by Md Rahman, Arkaprabho Ghosh, Navin Bilwar, and Desh Shukla. Executive summary Cisco IT recently evaluated fine-tuning embedding models using [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":16903,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[5],"tags":[],"class_list":["post-16902","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technology"],"_links":{"self":[{"href":"https:\/\/dmsretail.com\/RetailNews\/wp-json\/wp\/v2\/posts\/16902","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dmsretail.com\/RetailNews\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dmsretail.com\/RetailNews\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dmsretail.com\/RetailNews\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/dmsretail.com\/RetailNews\/wp-json\/wp\/v2\/comments?post=16902"}],"version-history":[{"count":0,"href":"https:\/\/dmsretail.com\/RetailNews\/wp-json\/wp\/v2\/posts\/16902\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/dmsretail.com\/RetailNews\/wp-json\/wp\/v2\/media\/16903"}],"wp:attachment":[{"href":"https:\/\/dmsretail.com\/RetailNews\/wp-json\/wp\/v2\/media?parent=16902"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dmsretail.com\/RetailNews\/wp-json\/wp\/v2\/categories?post=16902"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dmsretail.com\/RetailNews\/wp-json\/wp\/v2\/tags?post=16902"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}