DEV Community

Cover image for Spring AI and Ollama - Building an Open-Source Chatbot
Thiago Villani
Thiago Villani

Posted on • Edited on

Spring AI and Ollama - Building an Open-Source Chatbot

We´ll use the newly released Spring AI 1.0 GA version, ready for production, to build a chat application with Spring AI, Ollama, Docker and provide out of the box chat memory management.
Let´s enable Java developers to quickly and easily add AI to their projects.

Dependencies:

  • Java 21
  • Spring Web
  • Ollama
  • H2 Database
  • JDBC Chat Memory Repository
  • Spring Boot Actuator

Why Spring AI + Ollama?

The AI engineering landscape is no longer just Python-centric. With Spring AI, Java developers can now build AI-powered applications using open-source models like Llama 3, Gemma, Deepseek-R1 and many more!
And the best part is: You can start hosting them locally via Ollama.

In this article, you’ll learn how to:

  1. Set up Ollama as your local LLM inference server (Docker).
  2. Integrate it with Spring AI as a Java-based AI engineering framework.
  3. Create a multi-user sessions with conversation history
  4. Build a streaming chatbot using Server-Sent Events (SSE) and an easy Frontend (HTML/CSS/JS).
  5. Dockerizing for local development and usage

Let’s dive in!

Architecture

SpringAI-Ollama

1. Setting Up Ollama & Downloading Models

We can start using compact models with more or less 1B parameters. For text generated tasks, small models are a good choice to get started.
Ollama lets you run open-source LLMs locally. Here’s how to get started:
Install Ollama (via Docker)

docker run -d -v ./ollama/ollama-server:/root/.ollama -p 11434:11434 --name ollama ollama/ollama 
Enter fullscreen mode Exit fullscreen mode

Download Models (pick one or more)

docker exec ollama ollama pull llama3.2:1b      # Meta's Llama 3  
docker exec ollama ollama pull gemma3:1b        # Google's Gemma
docker exec ollama ollama pull deepseek-r1:1.5b # Deepseek's R1
Enter fullscreen mode Exit fullscreen mode

Verify it’s running:

curl http://localhost:11434/api/generate -d '{  
  "model": "llama3.2:1b",  
  "prompt": "Hello, world!"  
}'
Enter fullscreen mode Exit fullscreen mode

2. AI Engineering 101: Beyond Python

While Python dominates AI tooling, Java is catching up with frameworks like Spring AI.
Key concepts:

Foundation Models: Pre-trained LLMs (e.g., Llama 3) that you can fine-tune.
Inference APIs: Tools like Ollama let you run these models locally.
AI Engineering: The art of integrating LLMs into real-world apps (e.g., chatbots, RAG systems).


3. Spring AI + Ollama: Java Meets LLMs

Spring AI is a the best choice to bring AI capabilities to the Spring ecosystem. Here’s how to connect it to Ollama:

  • Step 3.1: Add Spring AI to Your Project
<!-- use Ollama as LLM inference server and Model provider -->
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-ollama</artifactId>
</dependency>
<!-- use JDBC to store messages in a relational database. -->
    <dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-chat-memory-repository-jdbc</artifactId>
</dependency>
<dependency>
    <groupId>com.h2database</groupId>
    <artifactId>h2</artifactId>
    <scope>runtime</scope>
</dependency>
Enter fullscreen mode Exit fullscreen mode
  • Step 3.2: Configure Ollama in application.yml
spring:
  application:
    name: demo-chatbot
  ai:
    ollama:
      base-url: http://localhost:11434
      chat:
        model: llama3.2:1b # deepseek-r1:1.5b, gemma3:1b
    chat:
      memory:
        repository:
          jdbc:
            # https://docs.spring.io/spring-ai/reference/1.0/api/chat-memory.html#_schema_initialization
            initialize-schema: always
            schema: classpath:sql/schema-h2.sql
  datasource:
    url: jdbc:h2:mem:~/demo-chatbot
    driverClassName: org.h2.Driver
    username: sa
    password: password
  h2:
    console:
      enabled: true
      path: /h2
Enter fullscreen mode Exit fullscreen mode
  • Step 3.3: Call the LLM from Java

The ChatClient offers a fluent API for communicating with an AI Model.
The Default System Prompt creates a simple Prompt template and set the tone for responses.
The Advisors API provides a flexible way to intercept, modify, and enhance interactions with a Model.
(LLMs) are stateless, meaning they do not retain information about previous interactions.
Spring AI auto-configures a ChatMemory bean that allows you to store and retrieve messages across multiple interactions. For H2 you have to create the Schema. Place it inside: src/main/resources/sql/schema-h2.sql

CREATE TABLE IF NOT EXISTS SPRING_AI_CHAT_MEMORY (
    conversation_id VARCHAR(36) NOT NULL,
    content TEXT NOT NULL,
    type VARCHAR(10) NOT NULL CHECK (type IN ('USER', 'ASSISTANT', 'SYSTEM', 'TOOL')),
    "timestamp" TIMESTAMP NOT NULL
    );

CREATE INDEX IF NOT EXISTS SPRING_AI_CHAT_MEMORY_CONVERSATION_ID_TIMESTAMP_IDX
ON SPRING_AI_CHAT_MEMORY(conversation_id, "timestamp");
Enter fullscreen mode Exit fullscreen mode
@Configuration
public class ChatConfig {

    @Bean
    public ChatClient chatClient(ChatClient.Builder builder, ChatMemory chatMemory) {
        String defaultSystemPrompt = """
                Your are a useful AI assistant, your responsibility is provide users questions
                about a variety of topics.
                When answering a question, always greet first and state your name as JavaChat
                When unsure about the answer, simply state that you don´t know.
                """;
        return builder
                .defaultSystem(defaultSystemPrompt)
                .defaultAdvisors(
                        new SimpleLoggerAdvisor(),                //simply logs requests and responses with a Model
                        new PromptChatMemoryAdvisor(chatMemory)  //let Spring AI manage long term memory in the DB
                        )
                .build();
    }
}

Enter fullscreen mode Exit fullscreen mode
@RequestMapping("/api/chat")
@RestController
public class ChatController {

    @Autowired
    private ChatClient chatClient;

    @GetMapping
    public String chat(@RequestParam String question, @RequestParam String chatId) {
        return chatClient
                .prompt()
                .user(question)
                .advisors(advisor -> advisor
                        .param(ChatMemory.CONVERSATION_ID, chatId))
                .call()
                .content();
    }
}
Enter fullscreen mode Exit fullscreen mode

Test it: curl "http://localhost:8080/api/chat?question=Tell%20me%20a%20joke"


4. Streaming Chat with Server-Sent Events (SSE)

SSE is a lightweight protocol for real-time, one-way streaming from server to client (perfect for chatbots). Unlike WebSockets (bidirectional), SSE is simpler for use cases like LLM streaming.
SSE also provides a better UX for end users, because responses are published as soon as they´re ready (some complex replies can take over a minute or more). Let’s stream responses using SSE:

@GetMapping(value = "/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<ChunkResponseDTO> streamChat(@RequestParam String question, @RequestParam String chatId) {
    return chatClient
            .prompt()
            .user(question)
            .advisors(advisor -> advisor
                .param(ChatMemory.CONVERSATION_ID, chatId))
            .stream()
            .content()
            .map(chunk -> new ChunkResponseDTO(chunk));
}
Enter fullscreen mode Exit fullscreen mode

Key Details:

  • TEXT_EVENT_STREAM_VALUE: Header text/event-stream enables SSE.
  • SSE Format: Each message must end with \n\n. Prefix with data: for compliance.
  • Reactive Streams: Flux (from Project Reactor) handles asynchronous streaming.
public record ChunkResponseDTO(String value) {}
Enter fullscreen mode Exit fullscreen mode

Limitations with HTTP/1.1

Connection Limits:

  • Browsers allow only 6 concurrent HTTP/1.1 connections per domain.
  • SSE consumes one connection per stream, which can block other requests.

Upgrading to HTTP/2 for Performance

HTTP/2 fixes SSE bottlenecks with:

Multiplexing: Multiple streams over a single TCP connection. The maximum number of simultaneous HTTP streams is negotiated between the server and the client (defaults to 100)

How to Enable HTTP/2 in Spring Boot

  • Step 4.1: Configure HTTP/2 in application.yml
server:
  http2:
    enabled: true
  ssl:
    enabled: true
    key-store: classpath:keystore.p12
    key-store-password: yourpassword
Enter fullscreen mode Exit fullscreen mode
  • Step 4.2: Generate a Self-Signed Certificate (for testing only):
keytool -genkeypair -alias mydomain -keyalg RSA -keysize 2048 -storetype PKCS12 -keystore keystore.p12 -validity 365
Enter fullscreen mode Exit fullscreen mode

Verify HTTP/2 is Active (-k to trust self signed certificate)

curl --head -k https://localhost:8080/actuator/health


Frontend:

Starting with Javascript:

const chatStream = (question) => {
  const eventSource = new EventSource(`https://localhost:8080/api/chat/stream?chatId=1&question=${encodeURIComponent(question)}`);

  eventSource.onmessage = (e) => {

    console.log('New message:', JSON.parse(e.data).value);
    // Append to UI (e.g., a chat div)
    document.getElementById('messages').innerHTML += JSON.parse(e.data).value;
  };

  eventSource.onerror = (e) => {
    console.error('SSE error:', e);
    eventSource.close();
  };
};

// Usage
chatStream("Tell me about Java");
Enter fullscreen mode Exit fullscreen mode

Key Details:

EventSource: Native browser API for SSE (no libraries needed).
Automatic Reconnection: Built-in retry logic if the connection drops.


Secure Frontend Rendering for LLM Output

LLM responses often include Markdown or HTML (e.g., **bold**, <script>), which can lead to XSS vulnerabilities if rendered naively.
Here’s how to secure your frontend:

  • Step 4.3: Sanitize Markdown/HTML (Critical!) Use DOMPurify to sanitize raw LLM output before rendering:
<script src="https://cdnjs.cloudflare.com/ajax/libs/dompurify/3.0.6/purify.min.js"></script>
Enter fullscreen mode Exit fullscreen mode
eventSource.onmessage = (e) => {
    const chunkResponse = JSON.parse(e.data).value);
    console.log('New message:', chunkResponse);
    const sanitized = DOMPurify.sanitize(chunkResponse); // Strips malicious scripts
    // Append to UI (e.g., a chat div)
    document.getElementById('messages').innerHTML += sanitized;
  };
Enter fullscreen mode Exit fullscreen mode
  • Step 4.4: For Markdown Support (Optional)

If you want to render Markdown safely, use a library like Marked + DOMPurify:

<script src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"></script>
Enter fullscreen mode Exit fullscreen mode
  let chunkResponses = '';
  eventSource.onmessage = (e) => {
    chunkResponses += JSON.parse(e.data).value;

    // Sanitize all chunks received so far.
    DOMPurify.sanitize(chunkResponses);

    // Check if the output was insecure.
    if (DOMPurify.removed.length) {
      // If the output was insecure, immediately stop what you were doing.
      // Reset the parser and flush the remaining Markdown.
      chunkResponses = '';
      return;
    }
    // Append to UI (e.g., a chat div)
    document.getElementById('messages').innerHTML = marked.parse(chunkResponses);
  };
Enter fullscreen mode Exit fullscreen mode

Key Security Considerations: Never Trust LLM Output (nor user´s input)

  • Assume all LLM responses may contain malicious code (even unintentionally).
  • Assume users will try to break your code and test your security.
  • Example attack: Hey <script>fetch('/steal-cookie')</script>

Limitations with EventSource API

Even though using SSE in the client side is easy, the EventSource API has some restrictions:

  • No Custom Request Headers: Custom request headers are not allowed.
  • HTTP GET Only: There is no way to specify another HTTP method.
  • No Request Body: All the chat messages must be inside the URL, which is limited to 2000 characters in most browsers.
  • Check extension libraries for EventSource and SSE: Fetch Event Source, Fetch API + getReader()

  • Step 4.5: Starting the HTML Structure

Here’s the HTML structure that includes a form for user input, a container to display the streamed data and a side bar for message history.

<!DOCTYPE html>
<html lang="en">
<head>
    <title>Spring AI Chat</title>
    <link rel="stylesheet" href="layout.css">
</head>
<body>
<!-- Sidebar for chat history -->
<div id="sidebar">
    <h3>Chat History</h3>
    <ul id="history-list"></ul>
</div>

<!-- Main chat area -->
<div id="chat-container">
    <div id="messages"></div>
    <form id="input-form">
        <input type="text" id="prompt" placeholder="Type your message..." autocomplete="off">
        <button type="submit">Send</button>
    </form>
</div>

<script src="main.js"> </script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/dompurify/3.0.6/purify.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"></script>
</body>
</html>

Enter fullscreen mode Exit fullscreen mode

Place the HTML in src/main/resources/static/index.html
Put the JavaScript in src/main/resources/static/js/main.js


5. Deploy Your Spring AI + Ollama Chatbot with Docker 🚀

  • Step 5.1 Docker Compose Setup

Create ollama-docker-compose.yaml
(P.S. If your machine supports GPU, you can enable GPU acceleration inside Docker containers. Ollama Image docs)

services:
  # Ollama LLM inference server
  ollama:
    volumes: # Ollama with persistent storage (no redownloading models).
      - ./ollama/ollama-server:/root/.ollama
    container_name: ollama
    pull_policy: always
    tty: true
    restart: unless-stopped
    image: docker.io/ollama/ollama:latest
    ports:
      - 11434:11434
    environment:
      - OLLAMA_KEEP_ALIVE=24h
    # Enable GPU support
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
  # Spring AI Backend
  chat-app:
    build:
      context: . # Dockerfile in the root folder
    container_name: chat-app
    ports:
      - "8080:8080"
    environment:
      - SPRING_AI_OLLAMA_BASE_URL=http://ollama:11434
    depends_on:
      - ollama

Enter fullscreen mode Exit fullscreen mode
  • Step 5.2 Spring Boot Dockerfile
# Maven build stage
FROM maven:3.9.9-eclipse-temurin-21-alpine as build
WORKDIR /app
COPY pom.xml .
RUN mvn dependency:go-offline
COPY src/ ./src/
RUN mvn clean package

# Spring Boot package stage
FROM eclipse-temurin:21-jre-alpine
COPY --from=build app/target/*.jar app.jar
EXPOSE 8080
ENTRYPOINT ["java", "-jar", "app.jar"]

Enter fullscreen mode Exit fullscreen mode
  • Start everything using docker-compose build && docker-compose up -d
  • Navigate to: https://localhost:8080 and start your chat session

  • If you want to see how messages are stored in the Database, navigate to H2 console https://localhost:8080/h2

Spring-AI-Chat-Memory


Conclusion: Your Java AI Future

You just built a locally hosted, open-source chatbot with Spring AI and Ollama, no OpenAI API costs or Python required!
SSE + HTTP/2 + Spring AI = scalable, real-time LLM streaming.

Where to Go Next?

  • Checkout the full code
  • Experiment with RAG (Retrieval-Augmented Generation) using Spring AI’s embedding model api and vector databases.

What’ll you build? Share your thoughts in the comments! 👇

(P.S. Follow me for more Java + AI tutorials!)

Top comments (0)

OSZAR »