What hardware do I need?

An 8B model like Llama 3.1 needs roughly 8GB of RAM. It works on most modern laptops. 13B models need 16GB, 70B models need a GPU with 40GB+ VRAM. For professional writing and research tasks, an 8B model is sufficient.

Do I need an API key for Ollama?

No. Ollama does not use API keys. There is no provider account, no billing, and no sign-up. You install the software, pull a model, and use it.

How does Ollama compare to Claude or GPT-4?

Cloud models are more capable for complex reasoning and broad knowledge tasks. An 8B local model is sufficient for drafting, structuring, editing, and synthesizing professional documents. If your work requires deep reasoning or highly specialized domain knowledge, consider pairing Ollama for sensitive drafts with a cloud model for research and analysis.

Run Advisor Prep Hero with a local Ollama model.

Updated 2026-06-04 · 10 min read

The short version. Ollama runs an AI model directly on your computer. When you point Advisor Prep Hero at it, your files, your prompts, and your AI responses stay entirely on your machine. No API key. No provider account. No outbound network traffic. This is the setup for patent prosecution, matter-sensitive drafting, and any professional work where even encrypted transit to a cloud provider is not acceptable.

Why this matters

When you use a cloud AI provider, your prompt travels from your machine to a server, gets processed, and the response travels back. The connection is encrypted and reputable providers have strong terms, but the data did leave your machine. For most work, that is fine.

For some work, it is not. Patent prosecution is the clearest case: transmitting an invention disclosure to any external party before filing can raise questions under absolute-novelty regimes, including at the European Patent Office. An attorney or agent working on pre-filing prosecution wants the analysis to stay local. A local model means there is no API call, no outbound packet, no third-party server involved at any point.

That is what Ollama gives you. The model runs on your CPU or GPU. Advisor Prep Hero connects to it over localhost. Nothing leaves the machine.

What you need before starting

A computer with at least 8GB of RAM (16GB recommended)
About 5GB of free disk space for the model files
Advisor Prep Hero installed and open

The setup takes about 15 minutes, mostly spent waiting for the model to download. The commands below are the same on Mac, Windows, and Linux.

Step 1: Install Ollama

1 Download and install Ollama

Go to ollama.com/download.

Mac: Download the Mac app and drag it to Applications. Or run this in Terminal:

curl -fsSL https://ollama.com/install.sh | sh

Linux: Same one-command install:

curl -fsSL https://ollama.com/install.sh | sh

Windows: Download the Windows installer from the download page and run it. Ollama installs as a system service.

Step 2: Pull a model

2 Download a capable model

Open Terminal (Mac/Linux) or PowerShell (Windows) and run one of these commands.

Recommended for most professional work:

ollama pull llama3.1:8b

Llama 3.1 8B is a capable general-purpose model. It handles drafting, summarizing, structuring, and editing well. The download is about 4.7GB. It runs on most modern laptops without a dedicated GPU.

Lighter alternative if RAM is limited:

ollama pull mistral

Mistral 7B is a smaller, faster model with lower RAM requirements. Good for editing and shorter drafts.

You can pull more than one model and switch between them inside Advisor Prep Hero. Models store in ~/.ollama/models on Mac/Linux and in your user directory on Windows.

Step 3: Verify Ollama is running

3 Confirm the local API is up

Mac: Ollama starts automatically when you open the app. You will see a llama icon in the menu bar when it is running.

Linux and Windows: Run this in your terminal if Ollama did not start automatically:

ollama serve

Ollama's API listens at http://localhost:11434. You can confirm it is up by opening that address in a browser. You should see a response from the Ollama API. If you do not, check that the Ollama process is running.

Step 4: Connect Advisor Prep Hero to Ollama

4 Add the local provider in Advisor Prep Hero

Open Advisor Prep Hero
Go to Settings (gear icon or Ctrl+, on Windows/Linux, Cmd+, on Mac)
Click AI Models
Click Add Provider
Select Local (Ollama) from the provider list
Enter http://localhost:11434 as the base URL
Click Fetch Models — Advisor Prep Hero will query Ollama and populate the model dropdown automatically
Select the model you pulled in Step 2
Click Save

No API key field will appear — Ollama does not use one.

Step 5: Test with a prompt

5 Confirm it works

Open any file in Advisor Prep Hero and start an AI chat. Send a short message like "Summarize this document in two sentences." If the model responds, you are running fully local.

Response time will be slower than a cloud model, especially the first prompt after startup. That is normal. The model loads into memory on the first call. Subsequent prompts in the same session are faster.

Hardware notes

The model size you can run depends on how much RAM your machine has. Here is a practical guide:

Model size	RAM needed	GPU needed	Suitable for
7B or 8B (e.g., Llama 3.1 8B, Mistral 7B)	~8GB	No — runs on CPU	Drafting, editing, summarizing, structuring. Most professional writing tasks.
13B (e.g., Llama 2 13B)	~16GB	Helpful, not required	More nuanced reasoning. Still practical on a modern laptop with 16GB RAM.
70B (e.g., Llama 3.1 70B)	~40GB VRAM	Required (high-end GPU)	Near-cloud-quality output. Needs a workstation-class GPU.

For most patent drafting, CPA work, and contract editing, an 8B model is sufficient. The tasks that benefit most from larger models are complex legal reasoning, multi-step analysis, and broad knowledge retrieval. Those are the tasks where a cloud model with a local Ollama session for sensitive drafts is a reasonable hybrid.

If responses are slow on CPU: Responses from an 8B model on CPU typically run at 5-20 tokens per second, meaning a 300-word response takes 15-60 seconds. That is workable for drafting. If you need faster responses, consider a machine with a GPU, or use Ollama for sensitive drafts and a cloud model for faster iteration on non-sensitive work.

Privacy summary

What stays local with Ollama

No API key required. There is no account to create, no key to manage, no per-token billing.

No outbound AI traffic. Every prompt and response stays on your machine. The model runs locally. No data leaves.

No provider logging. There is no external server to log requests. Nothing is retained anywhere outside your machine.

Works offline. Once the model is downloaded, Ollama works without any internet connection.

Frequently asked questions

Does Ollama send my prompts to the internet?

No. Ollama runs entirely on your machine. When you send a prompt, it goes from Advisor Prep Hero to Ollama's local API at 127.0.0.1:11434. No outbound network traffic, no external server, no provider logging.

Do I need an API key?

No. Ollama does not use API keys. There is no provider account, no billing, and no sign-up required beyond installing the software and pulling a model.

Can I use Ollama for patent prosecution work?

Yes. With Ollama, your invention disclosures, draft claims, and prior art notes never leave your machine. No API call means no outbound transmission, which addresses concerns about pre-filing disclosure under absolute-novelty regimes. Worth discussing the specifics with IP counsel for your jurisdiction and client situation.

How does an 8B local model compare to Claude or GPT-4?

Cloud models are more capable for complex reasoning, broad knowledge retrieval, and tasks that require deep domain expertise. An 8B local model is sufficient for drafting, structuring, editing, and synthesizing documents where you are providing the substantive content and asking the model to organize or refine it. Many practitioners use Ollama for sensitive first drafts and a cloud model for research and analysis.

What if Advisor Prep Hero can't connect to Ollama?

Check that Ollama is running: look for the menu bar icon on Mac, or run ollama serve on Linux/Windows. Confirm the base URL in Advisor Prep Hero is exactly http://localhost:11434 with no trailing slash. If you are on Windows and using Windows Subsystem for Linux, the localhost address may differ — check Ollama's documentation for the WSL networking note.

Can I switch between Ollama and a cloud model in Advisor Prep Hero?

Yes. Advisor Prep Hero lets you add multiple providers. You can have Claude, OpenAI, and Ollama all configured. Each AI chat session uses whichever provider and model you select when starting it. You can keep Ollama as your default for sensitive work and switch to a cloud model when you want faster or more capable responses on non-sensitive tasks.

Try Advisor Prep Hero with your own model

Solo is $468/yr. Professional (with practice pack) is $948/yr. Works with Ollama, Claude, OpenAI, and Gemini. Your data never touches our servers.

Get Advisor Prep Hero