VMware · 18 min read

VMware Private AI Foundation: notes from a fintech operator

Production patterns for deploying NVIDIA AI Enterprise on vSphere 8 in environments where audit matters. Configurations, licensing math, and operator-side lessons.

Most AI infrastructure content assumes a greenfield deployment. Spin up some H100s on a cloud provider, configure Kubernetes, ship a model. Done in a weekend.

This is not the reality I work in.

I operate infrastructure for a fintech: ten VMware clusters, Dell VxRail and HPE Synergy 12000 hardware, increasingly NVIDIA A100 and H100 nodes for AI workloads. Every system I run is audited annually for PCI DSS, ISO 27001, and SBV compliance. I sit on the auditee side — preparing evidence, proving controls, surviving findings.

For environments like mine, the question is not “should we use AI?” It is “how do we deploy AI within our existing constraints?”

This note covers what I have learned. VMware Private AI Foundation with NVIDIA — what it actually is, configuration decisions that matter, licensing math the vendor will not volunteer, and patterns that have held up in production.

This is operator perspective, not consultant theory. Treat it as starting points to validate against your own environment.

What VMware Private AI Foundation actually is

Marketing materials describe VMware Private AI Foundation as a “turnkey solution.” Real deployments are more nuanced.

The Foundation is an architectural pattern, not a single product. It combines:

  1. VMware Cloud Foundation (VCF) — the underlying private cloud platform
  2. vSphere 8 — hypervisor with GPU virtualization support
  3. vSphere with Tanzu — Kubernetes orchestration on top of vSphere
  4. NVIDIA AI Enterprise — software stack including vGPU drivers, frameworks, and tools
  5. NVIDIA-Certified Systems — hardware validated for the stack

What is not included but you will need anyway:

  • A model repository or vector database (depends on use case)
  • MLOps tooling (W&B, MLflow, or similar)
  • Monitoring beyond vCenter (DCGM, Prometheus)
  • Cost allocation and showback (often custom)

This is important to understand upfront. “VMware Private AI Foundation” is a starting point, not a complete platform.

The hardware decision

Before any software discussion, you make a hardware choice. For most enterprise environments, this means one of three paths:

Path 1: Existing VxRail or vSAN ReadyNode + GPU expansion

This is the lowest-friction path if you already operate VxRail clusters. You add GPU-equipped nodes to existing clusters or build new GPU-specific clusters.

Pros from my experience:

  • Familiar operational model — your team already knows VxRail Manager
  • Existing lifecycle workflows apply
  • Compliance frameworks already approved for the platform
  • vSAN provides distributed storage

Cons I have hit:

  • Limited GPU density per node (typically 2-4 GPUs)
  • vSAN performance impact under heavy GPU I/O
  • Hardware refresh tied to VxRail certification cycles

Typical configuration we use: VxRail V670F nodes with 4× NVIDIA L40S or 2× H100 PCIe.

Path 2: HPE Synergy or Dell PowerEdge with composable approach

For higher GPU density and more flexibility, composable platforms like HPE Synergy 12000 frames work well. This is what we have moved to for newer AI clusters.

What I like:

  • Higher GPU density (up to 8 GPUs per node)
  • Composable design allows GPU/CPU/storage independence
  • Strong fit for both AI workloads and traditional VMs
  • Better suited for InfiniBand fabrics

What I have struggled with:

  • Higher complexity than VxRail
  • Composable management adds a layer (OneView, Synergy Composer)
  • More manual lifecycle work
  • Auditors initially asked more questions about the composable layer

Typical configuration: HPE Synergy 480 Gen11 compute modules with 4× H100 SXM or 8× H100 PCIe.

Path 3: Purpose-built NVIDIA DGX or BasePOD

For organizations going all-in on AI, NVIDIA reference architectures provide the highest performance. I have not personally deployed these but have evaluated them.

Tradeoffs:

  • Maximum performance (NVLink, InfiniBand)
  • Validated reference architecture
  • Best for large model training

But:

  • Expensive ($300K+ per DGX)
  • Separate operational model from rest of VMware estate
  • Compliance frameworks may need new approval

My take: For fintech AI workloads (mostly inference, RAG, smaller fine-tuning), Path 1 or 2 makes more sense than Path 3.

NVIDIA AI Enterprise licensing — the hidden math

NVIDIA AI Enterprise is a perpetual source of confusion in budget planning. Here is the math from our procurement cycles.

License model

NVIDIA AI Enterprise is licensed per GPU per year. As of 2026 pricing:

  • NVAIE Standard: ~$2,400/GPU/year (5-year subscription)
  • NVAIE Essentials: ~$1,000/GPU/year (basic vGPU only)

For a cluster with 16 H100 GPUs:

  • 16 × $2,400 = $38,400/year in NVAIE licensing alone

This is on top of:

  • vSphere licensing (per-CPU)
  • VxRail or Synergy support
  • NVIDIA hardware purchase ($30-40K per H100)

5-year TCO for a 16-GPU H100 cluster based on our planning numbers:

ItemCost
16× H100 GPUs ($35K avg)$560,000
4× HPE Synergy compute modules$200,000
Networking (InfiniBand or Spectrum-X)$150,000
NVAIE licensing (5 years)$192,000
vSphere/vSAN licensing$80,000
Power & cooling (5 years)$150,000
Operations staff (allocated)$300,000
5-year TCO$1,632,000

This is meaningful budget. Plan accordingly.

Licensing gotchas

A few licensing details that caught us off guard:

  1. All GPUs in a host need to be licensed. You cannot license 2 of 4 GPUs in a server.
  2. vGPU profiles count differently. Sharing one physical GPU among 4 VMs requires one license, not four.
  3. MIG instances are treated as fractional licenses (depending on profile size).
  4. License server (NLS) is mandatory. Your VMs can run for 7 days without license server contact, then they shut down. We learned this during a network maintenance window.
  5. Audit your usage. NVIDIA can request usage reports. Mismatch becomes a compliance issue with our auditors.

Architecture: how we layered it

For our fintech environment, the architecture follows a layered isolation model. This evolved over several iterations as auditors raised questions.

Layer 1: Physical isolation

Production AI workloads run on dedicated clusters, separate from general VMware estate. Two reasons:

  1. Performance: GPU workloads have different I/O profiles
  2. Audit clarity: Our auditors prefer physical isolation for AI workloads handling regulated data

Our current setup: three dedicated GPU clusters, 4-8 nodes each, depending on workload class.

Layer 2: Network isolation

GPU clusters need their own VLANs and subnets:

  • vMotion network (10/25 GbE) — for VM mobility
  • vSAN network (25/100 GbE) — if using vSAN for AI cluster
  • GPU compute network (InfiniBand 200/400 Gb or RoCE) — for multi-GPU workloads
  • Management network (1 GbE) — isolated from production
  • Storage network (32 Gb FC or NVMe-oF) — for shared model repositories

InfiniBand vs Ethernet (RoCE) is its own decision. For workloads we run (inference, RAG, smaller fine-tuning), Spectrum-X Ethernet works fine and integrates better with our existing network operations. InfiniBand only justified the complexity for one large training cluster.

Layer 3: Tenant isolation via vGPU profiles

Within a cluster, multiple business units share GPU resources via vGPU profiles. Key concepts as I understand them in practice:

Time-slicing (default vGPU mode):

  • Multiple VMs share one physical GPU through round-robin scheduling
  • Best for inference workloads with low utilization
  • Cheap to implement, no special hardware requirements

MIG (Multi-Instance GPU):

  • Hardware-level partitioning of A100/H100 GPUs into separate instances
  • Each instance has dedicated compute, memory, and L2 cache
  • Better audit story — auditors accept hardware-level isolation more readily
  • Required for processing PII or PCI data shared on GPUs

For our fintech with multiple LOBs, MIG was the right answer. The slight overhead is worth the audit clarity. When our PCI DSS QSA reviewed the architecture, MIG-based isolation passed without follow-up questions. Time-slicing would have required additional compensating controls documentation.

Layer 4: Workload isolation via Tanzu namespaces

vSphere with Tanzu provides Kubernetes namespaces with IAM policies. Each business unit gets:

  • Dedicated namespace
  • Quota on GPU resources (vGPU profile mapped to vmclass)
  • Network policies preventing cross-tenant traffic
  • Separate model registry/repository access

This pattern, combined with MIG underneath, provides defense-in-depth that has held up across our last two audit cycles.

Deployment walkthrough

Our typical deployment proceeded through these phases. Yours will vary.

Phase 1: Foundation (weeks 1-4)

  1. Procure NVIDIA-Certified hardware (8-12 week lead time was typical for us)
  2. Procure NVAIE licenses through Dell, HPE, or NVIDIA partner
  3. Set up NVIDIA License Server (NLS) — we use on-prem appliance because our segment is air-gapped from internet
  4. Deploy VCF with separate workload domain for AI
  5. Validate hardware against NVAIE compatibility matrix

Phase 2: GPU configuration (weeks 5-8)

  1. Install NVIDIA AI Enterprise VIB on ESXi hosts
  2. Configure default graphics type to “Shared Direct” (required for vGPU)
  3. Set up vGPU profiles for tenant pattern
  4. Configure MIG if using A100/H100 with multi-tenancy
  5. Validate license check-in/check-out with NLS

Phase 3: Tanzu and platform services (weeks 9-12)

  1. Enable vSphere with Tanzu on AI workload domain
  2. Create Supervisor Cluster with GPU-aware nodes
  3. Deploy Tanzu Kubernetes Grid clusters for tenants
  4. Install NVIDIA GPU Operator on TKG clusters
  5. Configure DCGM exporter for monitoring

Phase 4: Audit preparation (weeks 13-16)

This is the phase most teams underestimate.

  1. Document architecture in audit-ready format
  2. Map controls to compliance framework (we do PCI DSS, ISO 27001, SBV)
  3. Integrate with SIEM (Splunk in our case) for audit logging
  4. Configure HSM integration for model encryption keys
  5. Run internal review before external audit

Our 16-week timeline was realistic for our first AI cluster. Subsequent clusters deployed in 4-8 weeks once the pattern was established.

Monitoring: the critical layer

GPU operations are not like CPU operations. You need GPU-specific monitoring from day one.

Our minimum stack:

  1. NVIDIA DCGM exporter — exports GPU metrics in Prometheus format
  2. Prometheus — scrapes and stores metrics (15-second intervals minimum)
  3. Grafana — visualization with NVIDIA dashboards
  4. Alertmanager — alerts on GPU faults, thermal issues, license issues

Metrics we alert on:

  • DCGM_FI_DEV_GPU_TEMP > 85°C (thermal warning)
  • DCGM_FI_DEV_GPU_UTIL < 5% for >24h (unused capacity, costing money)
  • DCGM_FI_DEV_GPU_UTIL > 95% sustained (capacity warning)
  • DCGM_FI_DEV_ECC_DBE_VOL_TOTAL > 0 (memory errors — investigate immediately)
  • DCGM_FI_DEV_XID_ERRORS (any non-zero value, especially XID 79 — we have seen these in production)
  • NLS license check-out failures
  • VM-level: vGPU utilization mismatches

For audit preparation, we feed these metrics into Splunk alongside infrastructure logs. Auditors appreciate seeing GPU health metrics as part of overall monitoring, not as a separate AI-specific silo.

Audit-readiness considerations

This section will trigger your compliance team. That is a good thing.

Data residency

If your operations span regions with data residency requirements (we deal with SBV Vietnam requirements, plus GDPR for European customers), your AI infrastructure must enforce them.

What we do:

  • GPU clusters geographically tagged
  • Tanzu namespace node selectors enforce workload placement
  • Model artifacts stay in the region (encrypted storage with geo-locked keys)

Audit trail

Every AI operation must be auditable. We capture:

  • API gateway logs (which user, which model, when)
  • Model versioning (which exact model version processed each request)
  • Data lineage (what data went into what model output)
  • GPU resource allocation logs (which VM/namespace used which GPU at which time)

This required custom work. NVAIE does not provide this out of box. We built integration with our existing SIEM.

Encryption

  • At rest: Model files, embeddings, training data — encrypted with HSM-backed keys
  • In transit: TLS everywhere, including internal API calls between Tanzu pods
  • In use: Limited. We use NVIDIA Confidential Computing on H100 for sensitive workloads, but it comes with tradeoffs (performance impact, limited tooling)

Key management

We integrate NVAIE deployment with our existing KMS/HSM:

  • vSphere encryption for VMs containing model artifacts
  • vSAN encryption (where applicable)
  • TLS certificates managed via existing PKI
  • API keys for model access tied to identity provider

What auditors actually asked us

When our first AI cluster went through PCI DSS audit, here are the questions we got (paraphrased):

  1. “How do you ensure GPU memory is wiped between tenants?” → We explained MIG hardware isolation; QSA accepted.
  2. “Where are model files stored, and how are they protected?” → Storage encryption, key management, evidence walkthrough.
  3. “What is the change management process for model updates?” → CI/CD pipeline with approval gates, change tickets.
  4. “How do you log AI inference requests?” → API gateway logging into Splunk, retention 1 year.
  5. “What happens if a GPU fails mid-inference?” → DCGM monitoring, automatic VM recovery, incident response process.

ISO 27001 surveillance audit added: 6. “How is GPU resource consumption tracked for capacity planning?” → Prometheus retention, monthly capacity reviews. 7. “Are NVIDIA AI Enterprise licenses included in your software asset management?” → Yes, tracked in our SAM tool with renewal calendar.

SBV inspection focused on: 8. “Is sensitive customer data processed by AI models stored in Vietnam?” → Yes, demonstrated through node selectors and storage location evidence.

This is not the comprehensive auditor checklist. It is what we encountered. Your audits will differ.

Common mistakes I have observed

After several internal AI deployments and conversations with peers at other fintechs:

1. Under-provisioning networking

Teams focus on GPUs and forget that AI workloads stress networks. 10 GbE is insufficient for serious AI deployments. We learned this hard way. Plan for 25/100 GbE minimum.

2. Ignoring power and cooling

H100 SXM consumes 700W per GPU. An 8-GPU node draws 6-8 kW. Our facilities team needed to validate this before hardware arrived. One peer at another bank had to delay their deployment 6 weeks because facility power was insufficient.

3. Treating AI clusters like regular VMware clusters

GPU clusters need different operational practices. DRS algorithms do not optimize for GPU placement. vMotion of vGPU-attached VMs has limitations. Backup strategies need adjustment.

4. Skipping compliance review until later

Building the technical solution first, then asking compliance to bless it. This always ends in expensive rework. We engage compliance from week 1 now.

5. Mixing production and dev workloads

We tried mixing AI dev/test with production initially. Quickly led to noisy neighbor problems. Separated clusters from then on.

When VMware Private AI is the wrong choice

To be honest, VMware Private AI Foundation is not optimal for every workload. From what I have seen:

Skip it if:

  • Your workload is pure model training at scale (NVIDIA DGX BasePOD is better)
  • You have no existing VMware investment (Kubernetes-native is simpler)
  • Your team has zero VMware experience (learning curve is significant)
  • Your AI use case fits within public cloud constraints (use Azure/AWS)

Use it when:

  • You have substantial VMware operational maturity (we do)
  • Your workloads are mostly inference, RAG, or small-scale fine-tuning
  • Compliance requires private infrastructure
  • You need to integrate AI with existing enterprise systems

For our fintech, the answer was clearly “use it.” But we went in with eyes open about the complexity.

What I am reading next

If you are starting a VMware Private AI deployment, here is what I prioritized:

  1. NVIDIA AI Enterprise Compatibility Matrix (verify your hardware)
  2. VMware Cloud Foundation 5.x Architecture Guide
  3. NVIDIA AI Enterprise VMware Deployment Guide
  4. Your hardware vendor’s reference architecture (Dell, HPE, etc.)
  5. Your auditor’s controls catalog (so you know what they will ask)

Future notes in this series will cover specific deployment patterns, MIG configuration for tenant isolation, DCGM monitoring at scale, and audit evidence preparation workflows. Subscribe to the newsletter if you want to follow along.


These notes are based on operating fintech AI infrastructure since 2024. Configurations are specific to my environment; validate against your own hardware, software versions, and compliance requirements. I am an architect, not a consultant — these are notes, not advice.

Get deep technical insights weekly

Join 1,200+ infrastructure architects from banks, insurance, and enterprise IT teams. One email every Friday. No fluff.

Free. Unsubscribe anytime. No spam, ever.