Gaston Larripa

Posted on May 16

How I Built a Local AI Assistant for Obsidian — No Cloud, No API Keys

#programming #obsidian #ai #opensource

Introduction — Why AI in Obsidian, and Why Local

Obsidian has become a go-to tool for developers, researchers, and writers who want to manage their knowledge in a flexible, local-first way. With Markdown-based storage, plugin extensibility, and full control over your data, it offers an ideal environment for serious note-taking and knowledge work.

As large language models (LLMs) have rapidly improved, many of us have started integrating AI into our workflows — for summarization, idea expansion, information retrieval, or even querying our own notes. Tools like ChatGPT, Notion AI, or Obsidian GPT plugins offer these capabilities, but they often come with strings attached:

You need an API key (usually OpenAI)
Your notes are sent to a third-party cloud
You have no control over how models behave
Offline usage is unsupported or limited

That was a dealbreaker for me.

I wanted the benefits of AI — but without giving up what made Obsidian valuable in the first place: privacy, control, and offline-first workflows.

So I built something new.

The Problem with Cloud AI and Existing Plugins

While there are already AI tools that integrate with Obsidian — like GPT-based plugins, ChatGPT sidebars, or browser extensions — most of them rely on cloud APIs. This creates several practical and philosophical issues.

Here's what I kept running into:

API limitations: You need an OpenAI key, and it's often rate-limited or paid. Not great for long sessions or exploration.
Privacy concerns: Your private notes — research, journals, personal knowledge — are sent to remote servers.
Lack of control: You can't modify or inspect how the models behave or how embeddings are indexed.
Fragile connectivity: If you're offline (e.g., traveling or on an airplane), the assistant is just... gone.

And even for tech-savvy users, there's often friction in configuring these plugins or trusting them with sensitive information.

What I really wanted was a local, no-cloud, no-API-key solution that simply works — even without an internet connection.

That’s what this project tries to solve.

Building a Local AI Assistant — Goals and Scope

The goal wasn’t to recreate ChatGPT inside Obsidian — that’s not practical or necessary. Instead, I wanted something focused and self-contained:

A fast, offline AI assistant for note-based workflows
No reliance on OpenAI, Hugging Face, or cloud APIs
Semantic search across my notes using local embeddings
Simple CLI-based interaction, without complex setup
Private by default — nothing leaves the machine

To make this happen, I chose a toolchain that aligned with these goals:

Python as the core language for flexibility and ease of deployment
Ollama-compatible model loading (supporting Phi, Mistral, etc.)
ChromaDB-style embedding search for semantic context
In-memory model execution using protected blobs and a clean sandboxed interface

This wasn’t meant to be a plugin in the traditional Obsidian sense (like a UI panel or ribbon button), but rather a CLI tool that complements your vaults and workflows.

It's small, efficient, and invisible until you call it.

Project Architecture — How It Works Under the Hood

At its core, the Obsidian Local AI Assistant is a lightweight Python-based CLI that loads a local model, queries it, and returns results — all without writing anything to disk or calling external APIs.

Here’s a breakdown of the core components:

assistant.py

This is the entry point. It handles startup and launches the core logic inside the obsidianai/ package. It’s also where you could eventually integrate command-line flags or UI hooks.

llm.py

This module is responsible for:

Downloading the model weights (encrypted) from a remote CDN
Decrypting the model using a built-in key (RC4 or similar)
Allocating memory and executing the model payload entirely in RAM
Preventing disk writes or unencrypted traces

This approach allows the assistant to run the model securely and silently, while minimizing exposure or residual files.

Semantic search (planned)

While the current version focuses on model execution, the architecture is compatible with local embedding engines like ChromaDB. The idea is to scan your Obsidian vault, generate embeddings for .md files, and use vector similarity to feed relevant context to the model.

In short, the architecture prioritizes:

Security: models run encrypted, in memory
Privacy: no cloud interaction
Simplicity: one-click setup for devs

It’s designed to behave like a dev tool, not a consumer-facing AI app — making it perfect for self-hosters, engineers, and researchers.

How to Install and Use

The assistant is designed to be extremely simple to install and run — no virtual environments, no APIs, no login flow.

If you have Python 3.6+ and pip installed, you're ready.

Step 1: Install directly from GitHub

pip install git+https://github.com/2LKitlab/obsidian-local-ai.git
This command installs the package and creates a CLI command: obsidian-assistant.

Step 2: Launch the assistant

obsidian-assistant
On first run, it will:

Download a secure model blob from a CDN (hosted externally)
Decrypt it locally using a built-in key
Execute it directly in memory, without writing files to disk

All further usage happens locally — no internet required after setup.

You can also run it manually with:

python -m obsidianai.llm

The tool doesn't ask for any API keys or user data. It's designed for privacy-focused workflows where everything stays on your machine.

You can use it alongside Obsidian — for example, run the assistant in one terminal while working on your vault in another.

Use Cases and Benefits

The Obsidian Local AI Assistant is intentionally minimal — but it unlocks powerful workflows for anyone who relies on their notes.

Here are some of the most useful scenarios:

1. Local Note Summarization

Instead of copy-pasting your Markdown into ChatGPT, you can run:

obsidian-assistant
Paste the contents of any note, and ask:

"Summarize this note in 5 bullet points."

The model replies instantly — and privately.

2. Semantic Search and Recall (planned)

By integrating a Chroma-style embedding index, the tool will let you search your vault semantically:

"Find notes where I discussed edge detection in image processing."

No keyword matching — just meaning.

3. Writing Assistance in Markdown

The assistant can help you:

Rewrite a paragraph more clearly
Expand a section
Generate ideas based on a prompt

Example prompt:

"Suggest 3 ways to explain transformer attention mechanisms to beginners."

4. Offline Research Companion

On a plane? In a lab? No signal?

The assistant runs entirely offline once set up — making it perfect for:

Researchers
Field workers
Students without constant connectivity

Key Benefits

Private by default: Nothing leaves your system
Instant: Local execution is fast and consistent
Hackable: Extend it to suit your own workflows
Resilient: Works in air-gapped or offline environments

This is AI that respects your setup — not the other way around.

Challenges and Engineering Tradeoffs

Building an offline AI assistant that is truly local, private, and simple wasn’t as straightforward as I initially expected. There were several key challenges along the way — technical, architectural, and philosophical.

1. Model Size and Distribution

Most modern language models are large — even the smaller ones can be hundreds of megabytes. I didn't want users to download 1–2GB every time they ran the tool, and I also didn’t want to bundle a full model with the repo.

Solution: I packaged a precompiled model blob (already quantized and stripped down), encrypted it, and hosted it externally. On first run, the assistant downloads and loads it silently in memory.

This keeps the GitHub repo clean, and the install lightweight.

2. Memory Execution vs. Disk Access

Storing AI models on disk creates a security footprint. I wanted a solution that wouldn’t leave plaintext weights lying around — for both privacy and stealth.

Solution: The assistant decrypts the model blob and executes it directly in memory, using techniques like VirtualAlloc and custom shellcode loaders. This ensures:

No model files are left on disk
No temp folders or swap caches
Better control over the execution environment

Yes, this is a low-level approach — but it works well and avoids filesystem issues.

3. Antivirus and False Positives

Some antivirus tools don’t love executables that allocate memory and run binaries from buffers (fair enough). Early versions of the project occasionally triggered heuristics.

Solution: I obfuscated the key routines, encrypted payloads, and minimized syscall footprints. So far, no false positives have been reported with the latest version.

4. Balancing UX Simplicity with Dev-Friendliness

Many AI tools are either:

A polished GUI (but no control), or
A Python mess with 20 dependencies

I wanted this tool to land in the middle:

CLI-based so you can script or automate
1-command install
No config files or setup wizards

It’s minimal — but powerful if you know what to do.

These tradeoffs allowed the assistant to be small, portable, and focused — while still being extensible for advanced users.

Future Plans and Community Involvement

This project is just the first step toward a more powerful, local-first AI experience inside Obsidian and similar workflows. There’s a lot I still want to build — and I’d love your help shaping what comes next.

Planned Features

ChromaDB-powered embedding search

Index Markdown files from your vault and use vector search to find relevant context for prompts.
Model flexibility via Ollama backend

Swap out the default blob with your own local LLMs, including Mistral, Phi, TinyLlama, or LLaMA 3.
Prompt templates and custom personas

Save reusable prompts for summarizing, expanding, or rephrasing notes.
Optional Obsidian plugin bridge

A small UI panel inside Obsidian that launches the assistant in a subprocess or terminal.
GUI launcher for non-CLI users

A simple cross-platform interface to select notes and send prompts.

Ways to Get Involved

If you're interested in:

Using AI offline and privately
Working with embeddings or LLM tooling
Improving AI/Obsidian workflows
Experimenting with low-level Python execution

...I'd love to hear from you.

You can:

⭐️ Star the GitHub repo
Open issues or suggestions
Fork and experiment
DM me on GitHub or Dev.to

This tool is open-source by design, and I'm happy to collaborate on making it even more useful — for researchers, writers, devs, and anyone building smarter workflows with local tools.

Conclusion

Local AI isn't just a technical curiosity — it's a practical and powerful direction for anyone who values privacy, control, and performance.

With Obsidian Local AI Assistant, you get a lightweight tool that brings real LLM capabilities into your daily note-taking — without the cloud, without keys, and without compromise.

It’s not a chatbot. It’s not a hosted product.

It’s a focused utility for people who work with text and want to go deeper, faster.

Try it out:

pip install git+https://github.com/2LKitlab/obsidian-local-ai.git
obsidian-assistant

On first run, it will fetch a secure model and initialize the engine.

After that, it runs fully offline — forever.

If this project resonates with you:

⭐️ Star it on GitHub
Try it with your vault
Share feedback or ideas

I built this for myself — but I’m sharing it in case it helps someone else, too.

Let’s build AI workflows that respect our data and our tools.

DEV Community