Skip to content

Commit 457679c

Browse files
committed
adding recipe for cesc
1 parent aaefabc commit 457679c

File tree

23 files changed

+1411
-0
lines changed

23 files changed

+1411
-0
lines changed
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# --- Build stage ---
2+
FROM eclipse-temurin:21-jdk AS builder
3+
4+
WORKDIR /app
5+
COPY . .
6+
7+
# Use parallel threads and configure Gradle for speed
8+
RUN ./gradlew :artificial-intelligence:context-enabled-semantic-caching-with-spring-ai:bootJar \
9+
--no-daemon \
10+
--parallel \
11+
--build-cache \
12+
--configuration-cache \
13+
--max-workers=$(nproc)
14+
15+
# --- Runtime stage ---
16+
FROM eclipse-temurin:21-jre
17+
18+
WORKDIR /app
19+
COPY --from=builder /app/artificial-intelligence/context-enabled-semantic-caching-with-spring-ai/build/libs/*.jar app.jar
20+
21+
EXPOSE 8080
22+
ENTRYPOINT ["java", "-jar", "app.jar"]
Lines changed: 276 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,276 @@
1+
### Context-Enabled Semantic Caching with Spring AI Demo
2+
3+
Semantic Caching is a technique that enhances Large Language Model (LLM) applications by caching responses based on the semantic meaning of queries rather than exact matches.
4+
5+
Even though Semantic Caching can help us save costs and time, it may come with downsides depending on the business on which its applied.
6+
7+
Sometimes, prompts may be similar, but refer to different contexts. For example: `What kind of beer goes well with meat?` and `What kind of beer goes well with Pizza?`
8+
9+
These two prompts are semantically similar, but refer to two different context: `Pizza` and `Meat` - This is where Context Enabled Semantic Caching may help.
10+
11+
Instead of relying solely on the semantic caching, we can serve the cached response to a less capable, cheaper, and faster model with the new provided information so that it can generate a response that satisfies the prompt with information, tone, or other characteristics that came from the more capable model.
12+
13+
This demo showcases how to implement Context-Enabled Semantic Caching using Spring AI and Redis Vector Store to improve performance and reduce costs in a beer recommendation system.
14+
15+
## Learning resources:
16+
17+
- Video: [What is semantic caching?](https://www.youtube.com/watch?v=AtVTT_s8AGc)
18+
- Video: [What is an embedding model?](https://youtu.be/0U1S0WSsPuE)
19+
- Video: [Exact vs Approximate Nearest Neighbors - What's the difference?](https://youtu.be/9NvO-VdjY80)
20+
- Video: [What is a vector database?](https://youtu.be/Yhv19le0sBw)
21+
22+
## Requirements
23+
24+
To run this demo, you’ll need the following installed on your system:
25+
- Docker – [Install Docker](https://docs.docker.com/get-docker/)
26+
- Docker Compose – Included with Docker Desktop or available via CLI installation guide
27+
- An OpenAI API Key – You can get one from [platform.openai.com](https://platform.openai.com)
28+
29+
## Running the demo
30+
31+
The easiest way to run the demo is with Docker Compose, which sets up all required services in one command.
32+
33+
### Step 1: Clone the repository
34+
35+
If you haven’t already:
36+
37+
```bash
38+
git clone https://github.com/redis-developer/redis-springboot-recipes.git
39+
cd redis-springboot-recipes/artificial-intelligence/semantic-caching-with-spring-ai
40+
```
41+
42+
### Step 2: Configure your environment
43+
44+
You can pass your OpenAI API key in two ways:
45+
46+
#### Option 1: Export the key via terminal
47+
48+
```bash
49+
export OPENAI_API_KEY=sk-your-api-key
50+
```
51+
52+
#### Option 2: Use a .env file
53+
54+
Create a `.env` file in the same directory as the `docker-compose.yml` file:
55+
56+
```env
57+
OPENAI_API_KEY=sk-your-api-key
58+
```
59+
60+
### Step 3: Start the services
61+
62+
```bash
63+
docker compose up --build
64+
```
65+
66+
This will start:
67+
68+
- redis: for storing both vector embeddings and chat history
69+
- redis-insight: a UI to explore the Redis data
70+
- semantic-caching-app: the Spring Boot app that implements the RAG application
71+
72+
## Using the demo
73+
74+
When all of your services are up and running. Go to `localhost:8080` to access the demo.
75+
76+
![Screenshot of a web app titled “Semantic Caching with Spring AI.” It features a Beer Knowledge Assistant chat interface with a welcome message, input box, and “Start New Chat” and “Clear Chat” buttons. The footer displays “Powered by Redis.”](readme-assets/1_home.png)
77+
78+
If you click on `Start Chat`, it may be that the embeddings are still being created, and you get a message asking for this operation to complete. This is the operation where the documents we'll search through will be turned into vectors and then stored in the database. It is done only the first time the app starts up and is required regardless of the vector database you use.
79+
80+
![Popup message stating that embeddings are still being created (14,472 of 20,000 completed), with an estimated duration of three minutes and a “Close” button.](readme-assets/2_embeddings_being_created.png)
81+
82+
Once all the embeddings have been created, you can start asking your chatbot questions. It will semantically search through the documents we have stored, try to find the best answer for your questions, and cache the responses semantically in Redis:
83+
84+
![Animated screen recording of a user typing “What kind of beer goes well with smoked meat?” into the Beer Knowledge Assistant in the Semantic Caching with Spring AI demo. The interface shows the question being sent, demonstrating semantic search in action.](readme-assets/3_asking_a_question.gif)
85+
86+
If you ask something similar to a question had already been asked, your chatbot will retrieve it from the cache instead of sending the query to the LLM. Retrieving an answer much faster now.
87+
88+
![Animated screen recording showing a user asking a similar follow-up question, “What type of beer is a good combination with smoked beef?” The assistant instantly retrieves a cached answer from Redis, demonstrating faster response through semantic caching.](readme-assets/4_retrieving_from_cache.gif)
89+
90+
## How It Is Implemented
91+
92+
The application uses Spring AI's `RedisVectorStore` to store and retrieve responses from a semantic cache.
93+
94+
### Configuring the Chat Models
95+
96+
```kotlin
97+
@Bean
98+
fun openAiExpensiveChatModel(): OpenAiChatModel {
99+
val modelName = "gpt-5-2025-08-07"
100+
return openAiChatModel(modelName)
101+
}
102+
103+
@Bean
104+
fun openAiCheapChatModel(): OpenAiChatModel {
105+
val modelName = "gpt-5-nano-2025-08-07"
106+
return openAiChatModel(modelName)
107+
}
108+
109+
private fun openAiChatModel(modelName: String): OpenAiChatModel {
110+
val openAiApi = OpenAiApi.builder()
111+
.apiKey(System.getenv("OPENAI_API_KEY"))
112+
.build()
113+
val openAiChatOptions = OpenAiChatOptions.builder()
114+
.model(modelName)
115+
.temperature(0.4)
116+
.build()
117+
118+
return OpenAiChatModel.builder()
119+
.openAiApi(openAiApi)
120+
.defaultOptions(openAiChatOptions)
121+
.build()
122+
}
123+
```
124+
125+
### Configuring the Semantic Cache
126+
127+
```kotlin
128+
@Bean
129+
fun semanticCachingVectorStore(
130+
embeddingModel: TransformersEmbeddingModel,
131+
jedisPooled: JedisPooled
132+
): RedisVectorStore {
133+
return RedisVectorStore.builder(jedisPooled, embeddingModel)
134+
.indexName("semanticCachingIdx")
135+
.contentFieldName("content")
136+
.embeddingFieldName("embedding")
137+
.metadataFields(
138+
RedisVectorStore.MetadataField("answer", Schema.FieldType.TEXT),
139+
)
140+
.prefix("semantic-caching:")
141+
.initializeSchema(true)
142+
.vectorAlgorithm(RedisVectorStore.Algorithm.HSNW)
143+
.build()
144+
}
145+
```
146+
147+
Let's break this down:
148+
149+
- **Index Name**: `semanticCachingIdx` - Redis will create an index with this name for searching cached responses
150+
- **Content Field**: `content` - The raw prompt that will be embedded
151+
- **Embedding Field**: `embedding` - The field that will store the resulting vector embedding
152+
- **Metadata Fields**: `answer` - A TEXT field to store the LLM's response
153+
- **Prefix**: `semantic-caching:` - All keys in Redis will be prefixed with this to organize the data
154+
- **Vector Algorithm**: `HSNW` - Hierarchical Navigable Small World algorithm for efficient approximate nearest neighbor search
155+
156+
### Storing Responses in the Semantic Cache
157+
158+
When a user asks a question and the system generates a response, it stores the prompt and response in the semantic cache:
159+
160+
```kotlin
161+
fun storeInCache(prompt: String, answer: String) {
162+
semanticCachingVectorStore.add(listOf(Document(
163+
prompt,
164+
mapOf(
165+
"answer" to answer
166+
)
167+
)))
168+
}
169+
```
170+
171+
This method:
172+
1. Creates a `Document` with the prompt as the content
173+
2. Adds the answer as metadata
174+
3. Stores the document in the vector store, which automatically generates and stores the embedding
175+
176+
### Retrieving Responses from the Semantic Cache
177+
178+
When a user asks a question, the system first checks if there's a semantically similar question in the cache:
179+
180+
```kotlin
181+
fun getFromCache(prompt: String, similarityThreshold: Double): String? {
182+
val results = semanticCachingVectorStore.similaritySearch(
183+
SearchRequest.builder()
184+
.query(prompt)
185+
.topK(1)
186+
.build()
187+
)
188+
189+
if (results?.isNotEmpty() == true) {
190+
if (similarityThreshold < (results[0].score ?: 0.0)) {
191+
logger.info("Returning cached answer. Similarity score: ${results[0].score}")
192+
return results[0].metadata["answer"] as String
193+
}
194+
}
195+
196+
return null
197+
}
198+
```
199+
200+
This method:
201+
1. Performs a vector similarity search for the most similar prompt in the cache
202+
2. Checks if the similarity score is above the threshold (typically 0.8)
203+
3. If a match is found, the system uses the cheaper model to compute the new response based on the new knowledge and the previously generated response.
204+
205+
### Integrating with the RAG System
206+
207+
The RAG service integrates the semantic cache with the RAG system:
208+
209+
```kotlin
210+
// Regular prompt and prompt suffix in case of cache hit
211+
212+
private val systemBeerPrompt = """
213+
You're assisting with questions about products in a beer catalog.
214+
Use the information from the DOCUMENTS section to provide accurate answers.
215+
The answer involves referring to the ABV or IBU of the beer, include the beer name in the response.
216+
If unsure, simply state that you don't know.
217+
218+
DOCUMENTS:
219+
{documents}
220+
""".trimIndent()
221+
222+
private val semanticCachedAnswerPromptSuffix = """
223+
A similar prompt has been processed before. Use it as the base for your response with the new document selection and new prompt:
224+
225+
SIMILAR PROMPT ALREADY PROCESSED:
226+
SIMILAR PROMPT:
227+
{similarPrompt}
228+
229+
SIMILAR ANSWER:
230+
{similarAnswer}
231+
""".trimIndent()
232+
233+
234+
fun retrieve(message: String): RagResult {
235+
// Get documents
236+
val docs = getDocuments(message)
237+
238+
// Get potential cached answer
239+
val (cachedQuestion, cachedAnswer) = semanticCachingService.getFromCache(message, 0.8)
240+
241+
// Generate System Prompt
242+
val systemMessage = if (cachedQuestion != null && cachedAnswer != null) {
243+
getSystemMessage(docs, cachedQuestion, cachedAnswer)
244+
} else {
245+
getSystemMessage(docs)
246+
}
247+
248+
val userMessage = UserMessage(message)
249+
250+
val prompt = Prompt(listOf(systemMessage, userMessage))
251+
252+
// Call the expensive or cheap model accordingly
253+
val response: ChatResponse = if (cachedQuestion != null && cachedAnswer != null) {
254+
openAiCheapChatModel.call(prompt)
255+
} else {
256+
openAiExpensiveChatModel.call(prompt)
257+
}
258+
259+
// Store in semantic caching
260+
semanticCachingService.storeInCache(message, response.result.output.text.toString())
261+
262+
return RagResult(
263+
generation = response.result
264+
)
265+
}
266+
```
267+
268+
This orchestrates the entire process:
269+
1. Check if there's a semantically similar prompt in the cache
270+
2. If found, return the cached answer immediately
271+
3. If not found, perform the standard RAG process:
272+
- Retrieve relevant documents using vector similarity search
273+
- Generate a response using the LLM
274+
- Store the prompt and response in the semantic cache for future use
275+
276+
This approach significantly improves performance and reduces costs by avoiding unnecessary LLM calls for semantically similar queries, while still providing accurate and contextually relevant responses.
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
plugins {
2+
kotlin("jvm") version "1.9.25"
3+
kotlin("plugin.spring") version "1.9.25"
4+
id("org.springframework.boot") version "3.5.5"
5+
id("io.spring.dependency-management") version "1.1.7"
6+
}
7+
8+
group = "com.redis"
9+
version = "0.0.1-SNAPSHOT"
10+
description = "context-enabled-semantic-caching"
11+
12+
java {
13+
toolchain {
14+
languageVersion = JavaLanguageVersion.of(21)
15+
}
16+
}
17+
18+
repositories {
19+
mavenCentral()
20+
}
21+
22+
extra["springAiVersion"] = "1.0.1"
23+
24+
dependencies {
25+
implementation("org.springframework.boot:spring-boot-starter")
26+
implementation("org.springframework.boot:spring-boot-starter-web")
27+
implementation("org.springframework.ai:spring-ai-transformers:1.0.0")
28+
implementation("org.springframework.ai:spring-ai-starter-vector-store-redis:1.0.0")
29+
implementation("org.springframework.ai:spring-ai-starter-model-openai:1.0.0")
30+
31+
implementation("com.redis.om:redis-om-spring:1.0.0")
32+
33+
implementation("org.jetbrains.kotlin:kotlin-reflect")
34+
testImplementation("org.springframework.boot:spring-boot-starter-test")
35+
testImplementation("org.jetbrains.kotlin:kotlin-test-junit5")
36+
testRuntimeOnly("org.junit.platform:junit-platform-launcher")
37+
}
38+
39+
dependencyManagement {
40+
imports {
41+
mavenBom("org.springframework.ai:spring-ai-bom:${property("springAiVersion")}")
42+
}
43+
}
44+
45+
kotlin {
46+
compilerOptions {
47+
freeCompilerArgs.addAll("-Xjsr305=strict")
48+
}
49+
}
50+
51+
tasks.withType<Test> {
52+
useJUnitPlatform()
53+
}

0 commit comments

Comments
 (0)