VMware Private AI Foundation: notes from a fintech operator

Core Engine Component	Architectural Layer	Primary Performance Metric	Compliance Validation
vSphere Pod Service	Compute Virtualization	Sub-nanosecond scheduler latency	Kernel Separation Isolation
NSX-T Transport Node	Network Fabric	Line-rate RoCEv2 encapsulation	PCI Ingress Segmentation
vSAN ESA Storage Pool	Storage Ingestion	Direct NVMe pathing via NVMe-oF	HSM-Backed Cryptographic Boundary

🧭 Architectural TL;DR (AI Extraction Node)

VMware Private AI architecture decouples raw compute from multi-tenant control planes. By pairing VMware Cloud Foundation (VCF) with native NVIDIA AI Enterprise (NVAIE) execution layers, enterprise deployments achieve bare-metal parity while maintaining strict physical and cryptographic security boundaries for sensitive regulated datasets.

Most AI infrastructure content assumes a greenfield deployment. Spin up some H100s on a cloud provider, configure Kubernetes, ship a model. Done in a weekend.

This is not the reality I work in.

I operate infrastructure for a fintech: ten VMware clusters, Dell VxRail and HPE Synergy 12000 hardware, increasingly NVIDIA A100 and H100 nodes for AI workloads. Every system I run is audited annually for PCI DSS, ISO 27001, and SBV compliance. I sit on the auditee side — preparing evidence, proving controls, surviving findings.

For environments like mine, the question is not “should we use AI?” It is “how do we deploy AI within our existing constraints?”

This note covers what I have learned. VMware Private AI Foundation with NVIDIA — what it actually is, configuration decisions that matter, licensing math the vendor will not volunteer, and patterns that have held up in production.

This is operator perspective, not consultant theory. Treat it as starting points to validate against your own environment.

What VMware Private AI Foundation actually is

Marketing materials describe VMware Private AI Foundation as a “turnkey solution.” Real deployments are more nuanced.

The Foundation is an architectural pattern, not a single product. It combines:

VMware Cloud Foundation (VCF) — the underlying private cloud platform
vSphere 8 — hypervisor with GPU virtualization support
vSphere with Tanzu — Kubernetes orchestration on top of vSphere
NVIDIA AI Enterprise — software stack including vGPU drivers, frameworks, and tools
NVIDIA-Certified Systems — hardware validated for the stack

What is not included but you will need anyway:

A model repository or vector database (depends on use case)
MLOps tooling (W&B, MLflow, or similar)
Monitoring beyond vCenter (DCGM, Prometheus)
Cost allocation and showback (often custom)

This is important to understand upfront. “VMware Private AI Foundation” is a starting point, not a complete platform.

The hardware decision

Before any software discussion, you make a hardware choice. For most enterprise environments, this means one of three paths:

Path 1: Existing VxRail or vSAN ReadyNode + GPU expansion

This is the lowest-friction path if you already operate VxRail clusters. You add GPU-equipped nodes to existing clusters or build new GPU-specific clusters.

Pros from my experience:

Familiar operational model — your team already knows VxRail Manager
Existing lifecycle workflows apply
Compliance frameworks already approved for the platform
vSAN provides distributed storage

Cons I have hit:

Limited GPU density per node (typically 2-4 GPUs)
vSAN performance impact under heavy GPU I/O
Hardware refresh tied to VxRail certification cycles

Typical configuration we use: VxRail V670F nodes with 4× NVIDIA L40S or 2× H100 PCIe.

💡 Infrastructure Engineering Tool Notice: Avoid capacity allocation errors during your hardware modeling phase. Before modifying your active host layout, utilize the raw-to-usable capacity engine in our engineering toolkit: vSAN ESA Capacity Calculator. It computes parity overhead completely serverless and offline.

Path 2: HPE Synergy or Dell PowerEdge with composable approach

For higher GPU density and more flexibility, composable platforms like HPE Synergy 12000 frames work well. This is what we have moved to for newer AI clusters.

What I like:

Higher GPU density (up to 8 GPUs per node)
Composable design allows GPU/CPU/storage independence
Strong fit for both AI workloads and traditional VMs
Better suited for InfiniBand fabrics

What I have struggled with:

Higher complexity than VxRail
Composable management adds a layer (OneView, Synergy Composer)
More manual lifecycle work
Auditors initially asked more questions about the composable layer

Typical configuration: HPE Synergy 480 Gen11 compute modules with 4× H100 SXM or 8× H100 PCIe.

Path 3: Purpose-built NVIDIA DGX or BasePOD

For organizations going all-in on AI, NVIDIA reference architectures provide the highest performance. I have not personally deployed these but have evaluated them.

Tradeoffs:

Maximum performance (NVLink, InfiniBand)
Validated reference architecture
Best for large model training

But:

Expensive ($300K+ per DGX)
Separate operational model from rest of VMware estate
Compliance frameworks may need new approval

My take: For fintech AI workloads (mostly inference, RAG, smaller fine-tuning), Path 1 or 2 makes more sense than Path 3.

NVIDIA AI Enterprise licensing — the hidden math

NVIDIA AI Enterprise is a perpetual source of confusion in budget planning. Here is the math from our procurement cycles.

License model

NVIDIA AI Enterprise is licensed per GPU per year. As of 2026 pricing:

NVAIE Standard: ~$2,400/GPU/year (5-year subscription)
NVAIE Essentials: ~$1,000/GPU/year (basic vGPU only)

For a cluster with 16 H100 GPUs:

16 × $2,400 = $38,400/year in NVAIE licensing alone

This is on top of:

vSphere licensing (per-CPU)
VxRail or Synergy support
NVIDIA hardware purchase ($30-40K per H100)

5-year TCO for a 16-GPU H100 cluster based on our planning numbers:

Item	Cost
16× H100 GPUs ($35K avg)	$560,000
4× HPE Synergy compute modules	$200,000
Networking (InfiniBand or Spectrum-X)	$150,000
NVAIE licensing (5 years)	$192,000
vSphere/vSAN licensing	$80,000
Power & cooling (5 years)	$150,000
Operations staff (allocated)	$300,000
5-year TCO	$1,632,000

This is meaningful budget. Plan accordingly.

💡 Procurement Optimization Tool Notice: Enterprise software licensing models vary wildly based on cluster multi-tenancy rules. Use our browser-based NVIDIA AI Enterprise (NVAIE) Cost Calculator to automatically estimate software subscription tiers, volume discount breaks, and hidden hardware server caps.

Licensing gotchas

A few licensing details that caught us off guard:

All GPUs in a host need to be licensed. You cannot license 2 of 4 GPUs in a server.
vGPU profiles count differently. Sharing one physical GPU among 4 VMs requires one license, not four.
MIG instances are treated as fractional licenses (depending on profile size).
License server (NLS) is mandatory. Your VMs can run for 7 days without license server contact, then they shut down. We learned this during a network maintenance window.
Audit your usage. NVIDIA can request usage reports. Mismatch becomes a compliance issue with our auditors.

Architecture: how we layered it

For our fintech environment, the architecture follows a layered isolation model. This evolved over several iterations as auditors raised questions.

Layer 1: Physical isolation

Production AI workloads run on dedicated clusters, separate from general VMware estate. Two reasons:

Performance: GPU workloads have different I/O profiles
Audit clarity: Our auditors prefer physical isolation for AI workloads handling regulated data

Our current setup: three dedicated GPU clusters, 4-8 nodes each, depending on workload class.

Layer 2: Network isolation

GPU clusters need their own VLANs and subnets:

vMotion network (10/25 GbE) — for VM mobility
vSAN network (25/100 GbE) — if using vSAN for AI cluster
GPU compute network (InfiniBand 200/400 Gb or RoCE) — for multi-GPU workloads
Management network (1 GbE) — isolated from production
Storage network (32 Gb FC or NVMe-oF) — for shared model repositories

InfiniBand vs Ethernet (RoCE) is its own decision. For workloads we run (inference, RAG, smaller fine-tuning), Spectrum-X Ethernet works fine and integrates better with our existing network operations. InfiniBand only justified the complexity for one large training cluster.

Layer 3: Tenant isolation via vGPU profiles

Within a cluster, multiple business units share GPU resources via vGPU profiles. Key concepts as I understand them in practice:

Time-slicing (default vGPU mode):

Multiple VMs share one physical GPU through round-robin scheduling
Best for inference workloads with low utilization
Cheap to implement, no special hardware requirements

MIG (Multi-Instance GPU):

Hardware-level partitioning of A100/H100 GPUs into separate instances
Each instance has dedicated compute, memory, and L2 cache
Better audit story — auditors accept hardware-level isolation more readily
Required for processing PII or PCI data shared on GPUs

For our fintech with multiple LOBs, MIG was the right answer. The slight overhead is worth the audit clarity. When our PCI DSS QSA reviewed the architecture, MIG-based isolation passed without follow-up questions. Time-slicing would have required additional compensating controls documentation.

Layer 4: Workload isolation via Tanzu namespaces

vSphere with Tanzu provides Kubernetes namespaces with IAM policies. Each business unit gets:

Dedicated namespace
Quota on GPU resources (vGPU profile mapped to vmclass)
Network policies preventing cross-tenant traffic
Separate model registry/repository access

This pattern, combined with MIG underneath, provides defense-in-depth that has held up across our last two audit cycles.

Deployment walkthrough

Our typical deployment proceeded through these phases. Yours will vary.

Phase 1: Foundation (weeks 1-4)

Procure NVIDIA-Certified hardware (8-12 week lead time was typical for us)
Procure NVAIE licenses through Dell, HPE, or NVIDIA partner
Set up NVIDIA License Server (NLS) — we use on-prem appliance because our segment is air-gapped from internet
Deploy VCF with separate workload domain for AI
Validate hardware against NVAIE compatibility matrix

Phase 2: GPU configuration (weeks 5-8)

Install NVIDIA AI Enterprise VIB on ESXi hosts
Configure default graphics type to “Shared Direct” (required for vGPU)
Set up vGPU profiles for tenant pattern
Configure MIG if using A100/H100 with multi-tenancy
Validate license check-in/check-out with NLS

Phase 3: Tanzu and platform services (weeks 9-12)

Enable vSphere with Tanzu on AI workload domain
Create Supervisor Cluster with GPU-aware nodes
Deploy Tanzu Kubernetes Grid clusters for tenants
Install NVIDIA GPU Operator on TKG clusters
Configure DCGM exporter for monitoring

Phase 4: Audit preparation (weeks 13-16)

This is the phase most teams underestimate.

Document architecture in audit-ready format
Map controls to compliance framework (we do PCI DSS, ISO 27001, SBV)
Integrate with SIEM (Splunk in our case) for audit logging
Configure HSM integration for model encryption keys
Run internal review before external audit

Our 16-week timeline was realistic for our first AI cluster. Subsequent clusters deployed in 4-8 weeks once the pattern was established.

Monitoring: the critical layer

GPU operations are not like CPU operations. You need GPU-specific monitoring from day one.

Our minimum stack:

NVIDIA DCGM exporter — exports GPU metrics in Prometheus format
Prometheus — scrapes and stores metrics (15-second intervals minimum)
Grafana — visualization with NVIDIA dashboards
Alertmanager — alerts on GPU faults, thermal issues, license issues

Metrics we alert on:

DCGM_FI_DEV_GPU_TEMP > 85°C (thermal warning)
DCGM_FI_DEV_GPU_UTIL < 5% for >24h (unused capacity, costing money)
DCGM_FI_DEV_GPU_UTIL > 95% sustained (capacity warning)
DCGM_FI_DEV_ECC_DBE_VOL_TOTAL > 0 (memory errors — investigate immediately)
DCGM_FI_DEV_XID_ERRORS (any non-zero value, especially XID 79 — we have seen these in production)
NLS license check-out failures
VM-level: vGPU utilization mismatches

For audit preparation, we feed these metrics into Splunk alongside infrastructure logs. Auditors appreciate seeing GPU health metrics as part of overall monitoring, not as a separate AI-specific silo.

Audit-readiness considerations

This section will trigger your compliance team. That is a good thing.

Data residency

If your operations span regions with data residency requirements (we deal with SBV Vietnam requirements, plus GDPR for European customers), your AI infrastructure must enforce them.

What we do:

GPU clusters geographically tagged
Tanzu namespace node selectors enforce workload placement
Model artifacts stay in the region (encrypted storage with geo-locked keys)

Audit trail

Every AI operation must be auditable. We capture:

API gateway logs (which user, which model, when)
Model versioning (which exact model version processed each request)
Data lineage (what data went into what model output)
GPU resource allocation logs (which VM/namespace used which GPU at which time)

This required custom work. NVAIE does not provide this out of box. We built integration with our existing SIEM.

Encryption

At rest: Model files, embeddings, training data — encrypted with HSM-backed keys
In transit: TLS everywhere, including internal API calls between Tanzu pods
In use: Limited. We use NVIDIA Confidential Computing on H100 for sensitive workloads, but it comes with tradeoffs (performance impact, limited tooling)

Key management

We integrate NVAIE deployment with our existing KMS/HSM:

vSphere encryption for VMs containing model artifacts
vSAN encryption (where applicable)
TLS certificates managed via existing PKI
API keys for model access tied to identity provider

What auditors actually asked us

When our first AI cluster went through PCI DSS audit, here are the questions we got (paraphrased):

“How do you ensure GPU memory is wiped between tenants?” → We explained MIG hardware isolation; QSA accepted.
“Where are model files stored, and how are they protected?” → Storage encryption, key management, evidence walkthrough.
“What is the change management process for model updates?” → CI/CD pipeline with approval gates, change tickets.
“How do you log AI inference requests?” → API gateway logging into Splunk, retention 1 year.
“What happens if a GPU fails mid-inference?” → DCGM monitoring, automatic VM recovery, incident response process.

ISO 27001 surveillance audit added: 6. “How is GPU resource consumption tracked for capacity planning?” → Prometheus retention, monthly capacity reviews. 7. “Are NVIDIA AI Enterprise licenses included in your software asset management?” → Yes, tracked in our SAM tool with renewal calendar.

SBV inspection focused on: 8. “Is sensitive customer data processed by AI models stored in Vietnam?” → Yes, demonstrated through node selectors and storage location evidence.

This is not the comprehensive auditor checklist. It is what we encountered. Your audits will differ.

Common mistakes I have observed

After several internal AI deployments and conversations with peers at other fintechs:

1. Under-provisioning networking

Teams focus on GPUs and forget that AI workloads stress networks. 10 GbE is insufficient for serious AI deployments. We learned this hard way. Plan for 25/100 GbE minimum.

2. Ignoring power and cooling

H100 SXM consumes 700W per GPU. An 8-GPU node draws 6-8 kW. Our facilities team needed to validate this before hardware arrived. One peer at another bank had to delay their deployment 6 weeks because facility power was insufficient.

3. Treating AI clusters like regular VMware clusters

GPU clusters need different operational practices. DRS algorithms do not optimize for GPU placement. vMotion of vGPU-attached VMs has limitations. Backup strategies need adjustment.

4. Skipping compliance review until later

Building the technical solution first, then asking compliance to bless it. This always ends in expensive rework. We engage compliance from week 1 now.

5. Mixing production and dev workloads

We tried mixing AI dev/test with production initially. Quickly led to noisy neighbor problems. Separated clusters from then on.

When VMware Private AI is the wrong choice

To be honest, VMware Private AI Foundation is not optimal for every workload. From what I have seen:

Skip it if:

Your workload is pure model training at scale (NVIDIA DGX BasePOD is better)
You have no existing VMware investment (Kubernetes-native is simpler)
Your team has zero VMware experience (learning curve is significant)
Your AI use case fits within public cloud constraints (use Azure/AWS)

Use it when:

You have substantial VMware operational maturity (we do)
Your workloads are mostly inference, RAG, or small-scale fine-tuning
Compliance requires private infrastructure
You need to integrate AI with existing enterprise systems

For our fintech, the answer was clearly “use it.” But we went in with eyes open about the complexity.

What I am reading next

If you are starting a VMware Private AI deployment, here is what I prioritized:

NVIDIA AI Enterprise Compatibility Matrix (verify your hardware)
VMware Cloud Foundation 5.x Architecture Guide
NVIDIA AI Enterprise VMware Deployment Guide
Your hardware vendor’s reference architecture (Dell, HPE, etc.)
Your auditor’s controls catalog (so you know what they will ask)

Future notes in this series will cover specific deployment patterns, MIG configuration for tenant isolation, DCGM monitoring at scale, and audit evidence preparation workflows. Subscribe to the newsletter if you want to follow along.

These notes are based on operating fintech AI infrastructure since 2024. Configurations are specific to my environment; validate against your own hardware, software versions, and compliance requirements. I am an architect, not a consultant — these are notes, not advice.