Try creating a fully offline, web-based Knowledge-Based Chatbot with On-device AI and Chrome Built-in AI.

TLDR

Use a technique similar to RAG (Retrieval-Augmented Generation) by pulling data from an on-device Knowledge Base to help the AI answer more accurately.

Create a chatbot that runs 100% on the user’s browser, no server required.

Use Transformers.js to run an AI model for converting text into vectors (Embeddings) on the client-side.

All user data remains private because it is not sent anywhere, and responses are instantaneous because there is no network latency.

Lately, I’ve been really into On-Device / Built-in AI. I even gave a talk about it at a Google I/O Extended event. This got me thinking about what server-side AIs can typically do and whether it’s possible to replicate that in a browser instead.

I figured, ‘Well, there are chatbots…’ which would probably just need an embedding model and a Prompt API, and that should be it. So, I decided to try writing some code for a prototype to see if it’s feasible to create an LLM-based chatbot this way, and if so, how to go about it.

A Personal AI That Doesn’t Rely on a Server

First of all, many people might wonder why we would use On-device AI and what’s so good about it. The main idea of this project is to create a chatbot that can run entirely by itself on the user’s device, without depending on any server. By achieving this, we get a full range of benefits:

Extremely Private: The data and questions you discuss with the bot will never leave the user’s machine.
Instantly Fast: Because there’s no data being sent back and forth over the internet, the response is almost instantaneous.
Cost-Effective: There are no server costs at all. Hosting it on simple static web hosting is sufficient.

The working principle of this project is to retrieve the most relevant information from a prepared set of knowledge (a Knowledge Base) and then send it to the AI to help formulate it into an answer. It’s just like doing typical RAG, but completing the entire process within the browser.

How does it work?

To make it more visual, I’ll break down the process into four main steps that all happen in the browser.

Convert Knowledge Base to Vector: I originally wanted to finish this step with the built-in AI, but the existing model didn’t support Embedding, so I had to find an Embedding model that could be used on the browser. The size became a big concern for me because if the model was too large, the project would likely fail. After some research, I found the all-MiniLM-L6-v2 model that supports Transformer.js and is incredibly small at just 20MB. When I actually started working on it, I loaded it in just a few seconds. This time, I took the mocked FAQ JSON and converted it to a vector and put it in LocalStorage. I couldn’t find a localDB that supported vectors, so I just saved it like this stupidly. In the end, if I wanted to do it for real, I would have to implement an index-db that supports vectors (actually, there is one, but no one has maintained it for a long time, so I skipped it: https://github.com/PaulKinlan/idb-vector).

	import { pipeline, env } from '@huggingface/transformers';
	import faqs from "@/faq.json";

	const extractor = await pipeline("feature-extraction", "Xenova/all-MiniLM-L6-v2");
	const embeddings = [];

	for (const faq of faqs) {
	const embedding = await extractor(faq.question + " " + faq.answer, {
	pooling: "mean",
	normalize: true,
	});

	const embeddingArray = embedding.tolist()[0];

	embeddings.push({
	embedding: embeddingArray,
	text: faq.question,
	metadata: { question: faq.question, answer: faq.answer },
	});
	}

view raw extract.js hosted with ❤ by GitHub

2. Now, when the user types in a question, I use the Transformers.js library to load the all-MiniLM-L6-v2 model (similar to the previous process that loaded the FAQ) to run in the browser. This model will convert the question sentence into a set of numbers (vector) that convey the meaning of that sentence. Now that we have the vector, we take it to find the cosine similarity with the doc stored in LocalStorage and then take the top 10 with the highest similarity. For the cosineSimilarity function, I searched everywhere. Anyone who wants to use the library can do so.

	const resultsWithScores = storedEmbeddings.map((item, index) => {
	const similarity = cosineSimilarity(queryEmbedding.tolist()[0], item.embedding);
	return {
	…item,
	similarity
	};
	});

	// Sort by similarity score (descending) and limit results
	const topResults = resultsWithScores
	.sort((a, b) => b.similarity – a.similarity)
	.slice(0, 10);

view raw search.js hosted with ❤ by GitHub

3. Assembling the Prompt: Instead of throwing raw questions at the AI, I incorporated the “relevant information” I found in step 2 into the prompt. This gave the AI the context and the right information to use in generating the answer. I also included an intial prompt to make it more understandable.

	const initialPrompts = [
	{
	role: 'system',
	content: `You are a knowledgeable Thai food and cuisine specialist. Your role is to help customers learn about authentic Thai dishes, ingredients, cooking methods, and food culture using accurate information from the FAQ data. Be enthusiastic about Thai cuisine and highlight its unique flavors and traditions.

	Key guidelines:
	– Use the provided FAQ information to give accurate, helpful answers about Thai food
	– Emphasize the authentic flavors, spices, and cooking techniques of Thai cuisine
	– Mention popular dishes like Pad Thai, Tom Yum, Green Curry, and Som Tam
	– Highlight the balance of sweet, sour, spicy, and savory flavors in Thai cooking
	– Use markdown formatting to make responses visually appealing with bold for key dishes and ingredients, bullet points for features, and clear structure
	– Be encouraging about exploring Thai cuisine and trying new dishes
	– If asked about spice levels, explain the different heat options and how to adjust them
	– Always be helpful and supportive of the customer\'s interest in Thai food
	– IMPORTANT: Keep all responses to 20 words or less
	– Remember and reference previous conversation context when appropriate`
	}
	];

	const faqContext = faqResults.map((result, index) =>
	`FAQ ${index + 1}:\nQuestion: ${result.metadata.question}\nAnswer: ${result.metadata.answer}`
	).join('\n\n');

	const prompt = `A customer is asking about Thai food and cuisine. Here's their question: "${question}"

	Based on the following Thai food FAQ information, please provide a helpful, enthusiastic response:

	FAQ Information:
	${faqContext}

	Please give a friendly, informative answer that helps them learn about Thai cuisine. Use markdown formatting to make your response visually appealing and highlight the authentic flavors and traditions of Thai food. Keep your response to exactly 20 words or less.`;

view raw prompt.js hosted with ❤ by GitHub

4. Generate a response with the Prompt API: Finally, I send the completed prompt to the Prompt API, Chrome/Edge’s built-in AI that generates a response that’s consistent with the data we’ve entered. Here’s what the code looks like when calling the Prompt API:

	// Check availability first
	const availability = await LanguageModel.availability({
	})

	// Create session with monitoring
	const session = await LanguageModel.create({
	temperature: 0.7,
	monitor(m) {
	m.addEventListener('downloadprogress', (e) => {
	console.log(`Downloaded ${e.loaded * 100}%`)
	})
	}
	})

	// Use regular API for complete response
	const result = await session.prompt(prompt)

	// Clean up when done
	session.destroy()

view raw prompt-api.js hosted with ❤ by GitHub

Limitations

Scope of Knowledge: The chatbot will only know what’s in the database we’ve prepared. It can’t answer any questions beyond that because the model running behind it is very small.
⚠️⚠️⚠️ Browser Compatibility: Currently, the core feature relies on the Prompt API. This is an experimental feature and is not available on the web. You must enable the flag or run it on the Origin trial. This feature is currently only available on Chrome and Microsoft Edge. If you want to try it out, please enable the feature flag first.
chrome://flags/#prompt-api-for-gemini-nano
edge://flags/#edge-llm-prompt-api-for-phi-mini
Device Resources: Currently, the built-in AI feature is only supported on desktops with 4 GB of RAM and 22 GB or more of available space.

It’s not finished yet

As I was doing this, I started thinking, what if there was no Prompt API? I thought about the previous article where we ran Gemma 3 1B on a browser, so I thought, let’s try combining the two, so I tried writing the same code as the previous article to run Gemma 3 1B.

	import {
	FilesetResolver,
	LlmInference,
	} from "https://cdn.jsdelivr.net/npm/@mediapipe/tasks-genai";

	const genai = await FilesetResolver.forGenAiTasks(
	"https://cdn.jsdelivr.net/npm/@mediapipe/tasks-genai@latest/wasm"
	);

	llmInference = await LlmInference.createFromOptions(genai, {
	baseOptions: {
	modelAssetPath: "/assets/gemma3-1B-it-int4.task",
	},
	maxTokens: 2048
	});

	const responseText = await llmInference.generateResponse("Hello, nice to meet you");

view raw gemma-on-web.js hosted with ❤ by GitHub

What I found was not very impressive, because no matter how small the model is, it still takes up 500MB of space. A 1B model is not enough, and the Prompt API is not comparable, because the model that uses the Prompt API, if I remember correctly, is Gemma E4B, which is much larger. But if you ask if it’s usable, the answer is that it’s usable.

conclude

If you ask if it’s possible to create a browser-based chatbot, the answer is yes. A simple chatbot or even a fixed-response one would be fine. But if you’re going for a 100% gemerative approach, you might need to manage the local database or context window properly, not to mention browser compatibility. Actually, this method could be applied to Android or iOS applications as well, not necessarily just web-based.

If you want to see the code I created, you can check it out on Github at https://github.com/thangman22/ondevice-chatbot. Just don’t expect it to be great, haha.

Thangman22's

Try creating a fully offline, web-based Knowledge-Based Chatbot with On-device AI and Chrome Built-in AI.

A Personal AI That Doesn’t Rely on a Server

How does it work?

Limitations

It’s not finished yet

conclude

Like this:

Discover more from Thangman22's

Leave a ReplyCancel reply

Try creating a fully offline, web-based Knowledge-Based Chatbot with On-device AI and Chrome Built-in AI.

A Personal AI That Doesn’t Rely on a Server

How does it work?

Limitations

It’s not finished yet

conclude

Share this:

Like this:

Discover more from Thangman22's

Leave a ReplyCancel reply

Discover more from Thangman22's