
TLDR
Use a technique similar to RAG (Retrieval-Augmented Generation) by pulling data from an on-device Knowledge Base to help the AI answer more accurately.
Create a chatbot that runs 100% on the user’s browser, no server required.
Use Transformers.js to run an AI model for converting text into vectors (Embeddings) on the client-side.
All user data remains private because it is not sent anywhere, and responses are instantaneous because there is no network latency.
Lately, I’ve been really into On-Device / Built-in AI. I even gave a talk about it at a Google I/O Extended event. This got me thinking about what server-side AIs can typically do and whether it’s possible to replicate that in a browser instead.
I figured, ‘Well, there are chatbots…’ which would probably just need an embedding model and a Prompt API, and that should be it. So, I decided to try writing some code for a prototype to see if it’s feasible to create an LLM-based chatbot this way, and if so, how to go about it.
A Personal AI That Doesn’t Rely on a Server
First of all, many people might wonder why we would use On-device AI and what’s so good about it. The main idea of this project is to create a chatbot that can run entirely by itself on the user’s device, without depending on any server. By achieving this, we get a full range of benefits:
- Extremely Private: The data and questions you discuss with the bot will never leave the user’s machine.
- Instantly Fast: Because there’s no data being sent back and forth over the internet, the response is almost instantaneous.
- Cost-Effective: There are no server costs at all. Hosting it on simple static web hosting is sufficient.
The working principle of this project is to retrieve the most relevant information from a prepared set of knowledge (a Knowledge Base) and then send it to the AI to help formulate it into an answer. It’s just like doing typical RAG, but completing the entire process within the browser.
How does it work?
To make it more visual, I’ll break down the process into four main steps that all happen in the browser.
- Convert Knowledge Base to Vector: I originally wanted to finish this step with the built-in AI, but the existing model didn’t support Embedding, so I had to find an Embedding model that could be used on the browser. The size became a big concern for me because if the model was too large, the project would likely fail. After some research, I found the all-MiniLM-L6-v2 model that supports Transformer.js and is incredibly small at just 20MB. When I actually started working on it, I loaded it in just a few seconds. This time, I took the mocked FAQ JSON and converted it to a vector and put it in LocalStorage. I couldn’t find a localDB that supported vectors, so I just saved it like this stupidly. In the end, if I wanted to do it for real, I would have to implement an index-db that supports vectors (actually, there is one, but no one has maintained it for a long time, so I skipped it: https://github.com/PaulKinlan/idb-vector).
2. Now, when the user types in a question, I use the Transformers.js library to load the all-MiniLM-L6-v2 model (similar to the previous process that loaded the FAQ) to run in the browser. This model will convert the question sentence into a set of numbers (vector) that convey the meaning of that sentence. Now that we have the vector, we take it to find the cosine similarity with the doc stored in LocalStorage and then take the top 10 with the highest similarity. For the cosineSimilarity function, I searched everywhere. Anyone who wants to use the library can do so.
3. Assembling the Prompt: Instead of throwing raw questions at the AI, I incorporated the “relevant information” I found in step 2 into the prompt. This gave the AI the context and the right information to use in generating the answer. I also included an intial prompt to make it more understandable.
4. Generate a response with the Prompt API: Finally, I send the completed prompt to the Prompt API, Chrome/Edge’s built-in AI that generates a response that’s consistent with the data we’ve entered. Here’s what the code looks like when calling the Prompt API:
Limitations
- Scope of Knowledge: The chatbot will only know what’s in the database we’ve prepared. It can’t answer any questions beyond that because the model running behind it is very small.
- ⚠️⚠️⚠️ Browser Compatibility: Currently, the core feature relies on the Prompt API. This is an experimental feature and is not available on the web. You must enable the flag or run it on the Origin trial. This feature is currently only available on Chrome and Microsoft Edge. If you want to try it out, please enable the feature flag first.
- chrome://flags/#prompt-api-for-gemini-nano
- edge://flags/#edge-llm-prompt-api-for-phi-mini
- Device Resources: Currently, the built-in AI feature is only supported on desktops with 4 GB of RAM and 22 GB or more of available space.
It’s not finished yet
As I was doing this, I started thinking, what if there was no Prompt API? I thought about the previous article where we ran Gemma 3 1B on a browser, so I thought, let’s try combining the two, so I tried writing the same code as the previous article to run Gemma 3 1B.
What I found was not very impressive, because no matter how small the model is, it still takes up 500MB of space. A 1B model is not enough, and the Prompt API is not comparable, because the model that uses the Prompt API, if I remember correctly, is Gemma E4B, which is much larger. But if you ask if it’s usable, the answer is that it’s usable.
conclude
If you ask if it’s possible to create a browser-based chatbot, the answer is yes. A simple chatbot or even a fixed-response one would be fine. But if you’re going for a 100% gemerative approach, you might need to manage the local database or context window properly, not to mention browser compatibility. Actually, this method could be applied to Android or iOS applications as well, not necessarily just web-based.
If you want to see the code I created, you can check it out on Github at https://github.com/thangman22/ondevice-chatbot. Just don’t expect it to be great, haha.

Leave a Reply