Spring AI tutorial: Get began with Spring AI

December 5, 2025

31

Use the next context to reply the person's query.
If the query can't be answered from the context, state that clearly.

Context:
{context}

Query:
{query}

Then I created a brand new SpringAIRagService:

package deal com.infoworld.springaidemo.service;

import java.util.Checklist;
import java.util.Map;
import java.util.stream.Collectors;

import org.springframework.ai.chat.shopper.ChatClient;
import org.springframework.ai.chat.immediate.Immediate;
import org.springframework.ai.chat.immediate.PromptTemplate;
import org.springframework.ai.doc.Doc;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.ai.vectorstore.SearchRequest;
import org.springframework.stereotype.Service;

@Service
public class SpringAIRagService {
    @Worth("classpath:/templates/rag-template.st")
    non-public Useful resource promptTemplate;
    non-public last ChatClient chatClient;
    non-public last VectorStore vectorStore;

    public SpringAIRagService(ChatClient.Builder chatClientBuilder, VectorStore vectorStore) {
        this.chatClient = chatClientBuilder.construct();
        this.vectorStore = vectorStore;
    }

    public String question(String query) {
        SearchRequest searchRequest = SearchRequest.builder()
                .question(query)
                .topK(2)
                .construct();
        Checklist similarDocuments = vectorStore.similaritySearch(searchRequest);
        String context = similarDocuments.stream()
                .map(Doc::getText)
                .gather(Collectors.becoming a member of("n"));

        Immediate immediate = new PromptTemplate(promptTemplate)
                .create(Map.of("context", context, "query", query));

        return chatClient.immediate(immediate)
                .name()
                .content material();
    }
}

The SpringAIRagService wires in a ChatClient.Builder, which we use to construct a ChatClient, together with our VectorStore. The question() methodology accepts a query and makes use of the VectorStore to construct the context. First, we have to construct a SearchRequest, which we do by:

Invoking its static builder() methodology.
Passing the query because the question.
Utilizing the topK() methodology to specify what number of paperwork we need to retrieve from the vector retailer.
Calling its construct() methodology.

On this case, we need to retrieve the highest two paperwork which might be most just like the query. In apply, you’ll use one thing bigger, corresponding to the highest three or prime 5, however since we solely have three paperwork, I restricted it to 2.

Subsequent, we invoke the vector retailer’s similaritySearch() methodology, passing it our SearchRequest. The similaritySearch() methodology will use the vector retailer’s embedding mannequin to create a multidimensional vector of the query. It is going to then evaluate that vector to every doc and return the paperwork which might be most just like the query. We stream over all related paperwork, get their textual content, and construct a context String.

Subsequent, we create our immediate, which tells the LLM to reply the query utilizing the context. Word that you will need to inform the LLM to make use of the context to reply the query and, if it can not, to state that it can not reply the query from the context. If we don’t present these directions, the LLM will use the info it was educated on to reply the query, which suggests it is going to use data not within the context we’ve supplied.

Lastly, we construct the immediate, setting its context and query, and invoke the ChatClient. I added a SpringAIRagController to deal with POST requests and go them to the SpringAIRagService:

package deal com.infoworld.springaidemo.internet;

import com.infoworld.springaidemo.mannequin.SpringAIQuestionRequest;
import com.infoworld.springaidemo.mannequin.SpringAIQuestionResponse;
import com.infoworld.springaidemo.service.SpringAIRagService;

import org.springframework.http.ResponseEntity;
import org.springframework.internet.bind.annotation.PostMapping;
import org.springframework.internet.bind.annotation.RequestBody;
import org.springframework.internet.bind.annotation.RestController;

@RestController
public class SpringAIRagController {
    non-public last SpringAIRagService springAIRagService;

    public SpringAIRagController(SpringAIRagService springAIRagService) {
        this.springAIRagService = springAIRagService;
    }

    @PostMapping("/springAIQuestion")
    public ResponseEntity askAIQuestion(@RequestBody SpringAIQuestionRequest questionRequest) {
        String reply = springAIRagService.question(questionRequest.query());
        return ResponseEntity.okay(new SpringAIQuestionResponse(reply));
    }
}

The askAIQuestion() methodology accepts a SpringAIQuestionRequest, which is a Java report:

package deal com.infoworld.springaidemo.mannequin;

public report SpringAIQuestionRequest(String query) {
}

The SpringAIQuestionRequest returns a SpringAIQuestionResponse:

package deal com.infoworld.springaidemo.mannequin;

public report SpringAIQuestionResponse(String reply) {
}

Now restart your software and execute a POST to /springAIQuestion. In my case, I despatched the next request physique:

{
    "query": "Does Spring AI help RAG?"
}

And acquired the next response:

{
    "reply": "Sure. Spring AI explicitly helps Retrieval Augmented Technology (RAG), together with chat reminiscence, integrations with main vector shops, a transportable vector retailer API with metadata filtering, and a doc injection ETL framework to construct RAG pipelines."
}

As you may see, the LLM used the context of the paperwork we loaded into the vector retailer to reply the query. We will additional take a look at whether or not it’s following our instructions by asking a query that’s not in our context:

{
    "query": "Who created Java?"
}

Right here is the LLM’s response:

{
    "reply": "The supplied context doesn't embrace details about who created Java."
}

This is a crucial validation that the LLM is just utilizing the supplied context to reply the query and never utilizing its coaching knowledge or, worse, making an attempt to make up a solution.

Conclusion

This text launched you to utilizing Spring AI to include giant language mannequin capabilities into Spring-based functions. You possibly can configure LLMs and different AI applied sciences utilizing Spring’s normal software.yaml file, then wire them into Spring parts. Spring AI gives an abstraction to work together with LLMs, so that you don’t want to make use of LLM-specific SDKs. For skilled Spring builders, this whole course of is just like how Spring Knowledge abstracts database interactions utilizing Spring Knowledge interfaces.

On this instance, you noticed how you can configure and use a big language mannequin in a Spring MVC software. We configured OpenAI to reply easy questions, launched immediate templates to externalize LLM prompts, and concluded through the use of a vector retailer to implement a easy RAG service in our instance software.

Spring AI has a sturdy set of capabilities, and we’ve solely scratched the floor of what you are able to do with it. I hope the examples on this article present sufficient foundational data that will help you begin constructing AI functions utilizing Spring. As soon as you’re snug with configuring and accessing giant language fashions in your functions, you may dive into extra superior AI programming, corresponding to constructing AI brokers to enhance your online business processes.

Learn subsequent: The hidden expertise behind the AI engineer.