Skip to main content

Spring AI + Milvus: Self-Hosted Vector Database Setup

Jeff Taakey
Author
Jeff Taakey
21+ Year CTO & Multi-Cloud Architect.
Table of Contents

Introduction: The Shift to Data Sovereignty in AI
#

In the rapidly evolving landscape of Generative AI, Retrieval-Augmented Generation (RAG) has emerged as the standard architecture for grounding Large Language Models (LLMs) with proprietary data. While services like Pinecone or Weaviate Cloud offer convenience, they introduce significant challenges regarding data privacy, latency, and, most notably, cost at scale.

For enterprise Java developers, Spring AI combined with a self-hosted Milvus instance represents the “Holy Grail” of vector search: open-source, massive scalability, and complete control over your data infrastructure.

This guide provides a comprehensive, step-by-step walkthrough of setting up a production-ready Milvus node using Docker and integrating it with a Spring Boot application using Spring AI. We will go beyond “Hello World” and discuss architectural trade-offs, index types, and metadata filtering strategies.


Why Milvus and Spring AI?
#

Before writing code, it is crucial to understand why this specific stack is gaining traction in the US and EU enterprise sectors.

1. Milvus: The Cloud-Native Vector Database
#

Milvus is not just a wrapper around Lucene. It is a cloud-native vector database built from the ground up to separate storage and computation.

  • Scalability: It can handle billions of vectors.
  • Performance: It utilizes advanced indexing algorithms (HNSW, IVF_FLAT) accelerated by SIMD instructions.
  • Ecosystem: It supports a rich set of SDKs and integrates seamlessly with the broader AI ecosystem.

2. Spring AI: The Portable Service Abstraction
#

Spring AI brings the “Write Once, Run Anywhere” philosophy to AI engineering. By implementing the VectorStore interface, Spring AI allows you to switch between vector databases (e.g., from simple in-memory testing to Milvus production) with zero code changes—only configuration tweaks.


Part 1: Infrastructure Setup (Docker & Milvus)
#

To simulate a production environment, we will not run Milvus in “Embedded” mode (which is for testing). We will set up a Standalone Milvus instance using Docker Compose.

The Architecture of Standalone Milvus
#

A standalone Milvus setup actually consists of three components:

  1. Milvus: The core engine handling vector computation.
  2. Etcd: Stores metadata and handles service discovery.
  3. MinIO (S3 compatible): Stores the actual persistence data (logs and index files).

docker-compose.yml
#

Create a directory named milvus-env and create the following file. We will also include Attu, an excellent GUI for managing Milvus.

version: '3.5'

services:
  etcd:
    container_name: milvus-etcd
    image: quay.io/coreos/etcd:v3.5.5
    environment:
      - ETCD_AUTO_COMPACTION_MODE=revision
      - ETCD_AUTO_COMPACTION_RETENTION=1000
      - ETCD_QUOTA_BACKEND_BYTES=4294967296
      - ETCD_SNAPSHOT_COUNT=50000
    volumes:
      - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/etcd:/etcd
    command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd
    healthcheck:
      test: ["CMD", "etcdctl", "endpoint", "health"]
      interval: 30s
      timeout: 20s
      retries: 3

  minio:
    container_name: milvus-minio
    image: minio/minio:RELEASE.2023-03-20T20-16-18Z
    environment:
      MINIO_ACCESS_KEY: minioadmin
      MINIO_SECRET_KEY: minioadmin
    ports:
      - "9001:9001"
      - "9000:9000"
    volumes:
      - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/minio:/minio_data
    command: minio server /minio_data --console-address ":9001"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
      interval: 30s
      timeout: 20s
      retries: 3

  standalone:
    container_name: milvus-standalone
    image: milvusdb/milvus:v2.3.13
    command: ["milvus", "run", "standalone"]
    security_opt:
      - seccomp:unconfined
    environment:
      ETCD_ENDPOINTS: milvus-etcd:2379
      MINIO_ADDRESS: milvus-minio:9000
      MINIO_ACCESS_KEY_ID: minioadmin
      MINIO_SECRET_ACCESS_KEY: minioadmin
    volumes:
      - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/milvus:/var/lib/milvus
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9091/healthz"]
      interval: 30s
      start_period: 90s
      timeout: 20s
      retries: 3
    ports:
      - "19530:19530"
      - "9091:9091"
    depends_on:
      - "etcd"
      - "minio"

  attu:
    container_name: attu
    image: zilliz/attu:v2.3.10
    environment:
      MILVUS_URL: milvus-standalone:19530
    ports:
      - "8000:3000"
    depends_on:
      - "standalone"

networks:
  default:
    name: milvus

Launching the Stack
#

Run the following command in your terminal:

docker-compose up -d

Once running, verify the installation:

  1. MinIO Console: http://localhost:9001 (User/Pass: minioadmin)
  2. Attu UI: http://localhost:8000
    • Connect to Milvus using the default standalone address. Since Attu is in the same docker network, it connects automatically. If accessing from host, ensure port mapping is correct.

Note: If you are running on an Apple Silicon (M1/M2/M3) chip, Milvus runs via Rosetta 2 seamlessly, but ensure your Docker Desktop allows experimental features if you encounter platform warnings.


Part 2: Spring Boot Project Configuration
#

Now that our database is running, let’s configure the application. We assume you are using Java 17+ and Spring Boot 3.2+.

1. Maven Dependencies
#

We need two primary starters: one for the embedding model (to turn text into vectors) and one for the Milvus integration.

<dependencies>
    <!-- Spring AI Core -->
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-bom</artifactId>
        <version>1.0.0-SNAPSHOT</version>
        <type>pom</type>
        <scope>import</scope>
    </dependency>

    <!-- OpenAI for Embeddings (You can also use Ollama or Transformers) -->
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
    </dependency>

    <!-- Milvus Vector Store -->
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-milvus-store-spring-boot-starter</artifactId>
    </dependency>

    <!-- Spring Boot Starter Web -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
    </dependency>
</dependencies>

Note: As Spring AI is rapidly evolving, ensure you have the Spring Milestones and Snapshots repositories configured in your pom.xml.

2. Application Configuration (application.yml)
#

This is where the magic happens. We configure Spring AI to talk to our local Milvus instance.

spring:
  application:
    name: spring-ai-milvus-demo

  ai:
    openai:
      api-key: ${OPENAI_API_KEY} 
      embedding:
        options:
          model: text-embedding-3-small # Cost-effective model

    vectorstore:
      milvus:
        client:
          host: localhost
          port: 19530
          username: "" # Default is empty for standalone
          password: "" 
        collection-name: vector_store
        embedding-dimension: 1536 # CRITICAL: Must match OpenAI model dimension
        index-type: IVF_FLAT    # Index algorithm
        metric-type: COSINE     # Similarity metric

Configuration Deep Dive:

  • embedding-dimension: This is the most common source of errors. If you use OpenAI’s text-embedding-3-small, the dimension is 1536. If you use llama3 via Ollama, it might be 4096. If this doesn’t match the collection schema, Milvus will reject the insert.
  • metric-type: COSINE is generally preferred for NLP tasks because it measures the angle between vectors (semantic similarity) regardless of magnitude. L2 (Euclidean distance) is better for strict matching.

Part 3: Implementation - The Vector Service
#

Let’s create a service that handles document ingestion (ETL) and retrieval (Search).

1. The Service Layer
#

We will inject the VectorStore interface. This is the beauty of Spring AI: our code doesn’t technically know it’s using Milvus.

package com.springdevpro.milvus.service;

import org.springframework.ai.document.Document;
import org.springframework.ai.vectorstore.SearchRequest;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;

import java.util.List;
import java.util.Map;

@Service
public class RagService {

    private final VectorStore vectorStore;

    @Autowired
    public RagService(VectorStore vectorStore) {
        this.vectorStore = vectorStore;
    }

    /**
     * Ingests data into Milvus.
     * In a real app, this would parse PDFs or JSONs.
     */
    public void loadKnowledgeBase(List<String> textChunks) {
        List<Document> documents = textChunks.stream()
                .map(content -> new Document(content, Map.of("ingestion_date", "2024-05-22")))
                .toList();
        
        // This call:
        // 1. Calls OpenAI to get embeddings
        // 2. Connects to Milvus
        // 3. Checks if collection exists (creates if not)
        // 4. Inserts vectors
        vectorStore.add(documents);
    }

    /**
     * Semantic Search
     */
    public List<Document> search(String query) {
        // Search for top 5 results with a similarity threshold of 0.7
        return vectorStore.similaritySearch(
                SearchRequest.query(query)
                        .withTopK(5)
                        .withSimilarityThreshold(0.7)
        );
    }
}

2. The Controller
#

Expose endpoints to test our setup.

package com.springdevpro.milvus.controller;

import com.springdevpro.milvus.service.RagService;
import org.springframework.ai.document.Document;
import org.springframework.web.bind.annotation.*;

import java.util.List;
import java.util.stream.Collectors;

@RestController
@RequestMapping("/api/rag")
public class MilvusController {

    private final RagService ragService;

    public MilvusController(RagService ragService) {
        this.ragService = ragService;
    }

    @PostMapping("/ingest")
    public String ingest(@RequestBody List<String> chunks) {
        ragService.loadKnowledgeBase(chunks);
        return "Indexed " + chunks.size() + " documents into Milvus.";
    }

    @GetMapping("/search")
    public List<String> search(@RequestParam String query) {
        List<Document> results = ragService.search(query);
        return results.stream()
                .map(Document::getContent)
                .collect(Collectors.toList());
    }
}

Part 4: Advanced Milvus Setup & Tuning
#

Getting it running is one thing; making it production-ready is another. Here are the critical considerations for the “Spring AI + Milvus” keywords strategy.

1. Understanding Index Types
#

In application.yml, we specified IVF_FLAT. Why?

  • FLAT: 100% recall (perfect accuracy) but slow on large datasets because it brute-force scans every vector.
  • IVF_FLAT (Inverted File): Divides vectors into clusters (Voronoi cells). Search only checks the closest clusters. Much faster, slight loss in accuracy.
  • HNSW (Hierarchical Navigable Small World): The industry standard for high performance. It builds a multi-layer graph. It is incredibly fast but consumes more memory.

Recommendation: For datasets < 1M vectors, IVF_FLAT is fine. For > 1M or high-concurrency low-latency needs, switch config to HNSW.

To use HNSW in Spring AI, you simply update the YAML:

        index-type: HNSW
        index-parameters: '{"M":16,"efConstruction":200}'

2. Metadata Filtering (The “Hybrid Search”) #

Pure vector search isn’t enough. Often, you want to “Find contracts similar to X but only from year 2023”.

Spring AI supports the Filter Expression Language. Milvus handles this efficiently by using scalar indexes alongside vector indexes.

public List<Document> searchWithFilter(String query, String year) {
    FilterExpressionBuilder b = new FilterExpressionBuilder();
    Expression filter = b.eq("ingestion_date", year).build();

    return vectorStore.similaritySearch(
            SearchRequest.query(query)
                    .withTopK(5)
                    .withFilterExpression(filter)
    );
}

Note: Ensure your metadata keys in the Document object do not contain special characters that conflict with Milvus schema rules.

3. Consistency Levels
#

Milvus offers tunable consistency (Strong, Bounded, Session, Eventually). By default, Milvus might prioritize speed over immediate consistency. If you insert a document and immediately search for it, you might miss it. To fix this in testing, you often need to force a sync or wait a few milliseconds. In Spring AI, the default implementation typically handles the necessary “flush” operations for you, but be aware of the “Bounded Staleness” concept in distributed systems.


Troubleshooting Common Issues
#

1. io.milvus.exception.ServerException: dimension mismatch
#

Cause: You created a collection with OpenAI embeddings (1536 dim), then tried to switch to a local Ollama model (4096 dim) without dropping the collection. Fix: Connect to Attu (localhost:8000), delete the collection vector_store, and restart your Spring Boot app. Milvus collections are immutable regarding dimension.

2. Connection Refused
#

Cause: Docker networking issues. Fix: Ensure host: localhost works if running the JAR outside Docker. If running the Spring App inside a container, use host: milvus-standalone.

3. Metadata Search Fails
#

Cause: Milvus requires explicit scalar indexing for efficient filtering on some fields in older versions, though newer versions handle dynamic schemas better. Fix: Enable dynamic schema in Milvus configuration if you have unpredictable metadata fields. Spring AI enables Dynamic Schema by default for Milvus.


Conclusion: The Business Case for Self-Hosting
#

Integrating Spring AI with Milvus moves your organization from “AI experimentation” to “AI ownership.”

  1. Cost Control: You are not paying per read/write unit (like in AWS DynamoDB or Pinecone). Your cost is simply the EC2/VM cost.
  2. Privacy: Your vectors (mathematical representations of your IP) never leave your VPC.
  3. Performance: Network latency is minimized when your Vector Store sits in the same Kubernetes cluster as your Spring Boot services.

This setup forms the backbone of a robust RAG pipeline. In upcoming articles, we will explore how to add Redis for caching vector results and how to use Spring Cloud Gateway to rate-limit access to your expensive LLM APIs.

Stay tuned to Spring DevPro for more architecture drills.


References:


About This Site: [StonehengeHugoTemplate].com

[StonehengeHugoTemplate].com is the ..., helping you solve core business and technical pain points.