AI Haven
AI News

How to Build a Local AI Agent with Ollama and LangChain

A practical guide to building a local AI agent using Ollama and LangChain — no API costs, runs entirely on your machine.

March 8, 2026

What You Need Before Starting

You'll need Python 3.11 or newer installed on your machine. Download Ollama — it runs open-source LLMs like Llama 3.2 and Mistral entirely offline on your computer. No GPU required for smaller models, though a GPU speeds up inference noticeably.

Install the required libraries:

  • pip install langchain langchain-community langchain-ollama python-dotenv
  • For document search: pip install faiss-cpu pypdf

Step 1: Set Up Your Local LLM

Pull a model and start the Ollama server:

ollama pull llama3.2
ollama serve

Now create your agent file and initialize the model in Python:

from langchain_ollama import OllamaLLM

llm = OllamaLLM(
    model="llama3.2",
    temperature=0.2,
    num_predict=1000
)

Low temperature gives more consistent, accurate responses — important for an agent that needs to reason through tasks.

Step 2: Add Tools for Real Actions

Agents need capabilities beyond chatting. Define tools using LangChain's @tool decorator:

from langchain.tools import tool

@tool
def calculator(expression: str) -> str:
    """Evaluate math expressions."""
    try:
        return str(eval(expression))
    except:
        return "Invalid expression."

@tool
def search_files(query: str) -> str:
    """Search local files in current directory."""
    import glob
    files = glob.glob(f"*{query}*")
    return f"Found files: {files}" if files else "No matches."

tools = [calculator, search_files]

You can extend this with file readers, API calls, or any Python function.

Step 3: Build the Agent Loop

Set up the prompt template and memory, then create the agent:

from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.memory import ConversationBufferMemory

system_prompt = """You are a helpful local AI agent. Use tools only when needed.
Respond concisely. You have access to calculator and file search."""

prompt = ChatPromptTemplate.from_messages([
    ("system", system_prompt),
    MessagesPlaceholder(variable_name="chat_history"),
    ("user", "{input}"),
    MessagesPlaceholder(variable_name="agent_scratchpad"),
])

memory = ConversationBufferMemory(return_messages=True, memory_key="chat_history")
agent = create_tool_calling_agent(llm, tools, prompt)
agent_executor = AgentExecutor(
    agent=agent, 
    tools=tools, 
    memory=memory, 
    verbose=True,
    handle_parsing_errors=True
)

Run it:

response = agent_executor.invoke({
    "input": "What's 15*23? Then search for 'tutorial' in files."
})
print(response['output'])

The agent reasons step-by-step: it recognizes it needs math, calls the calculator tool, then proceeds to file search.

Step 4: Add Document Search with RAG

For grounding responses in your own documents, add retrieval-augmented generation:

from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain_ollama import OllamaEmbeddings

loader = PyPDFLoader("your_document.pdf")
docs = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=1000)
splits = splitter.split_documents(docs)

embeddings = OllamaEmbeddings(model="llama3.2")
vectorstore = FAISS.from_documents(splits, embeddings)
retriever = vectorstore.as_retriever()

@tool
def rag_search(query: str) -> str:
    """Search local documents for relevant information."""
    docs = retriever.get_relevant_documents(query)
    return "\n".join([doc.page_content for doc in docs[:3]])

tools.append(rag_search)

Rebuild the agent with this new tool. Now it can answer questions about your documents.

Deployment and Next Steps

To run: python your_agent_file.py

From here, you can:

  • Add guardrails — refine prompts to prevent unwanted behavior
  • Scale with LangGraph — build multi-agent workflows
  • Wrap in FastAPI — create an API endpoint for other applications
  • Add persistent memory — use SQLite for long-term context

This setup keeps everything local — no API costs, no data leaves your machine. It's a foundation you can expand into complex automation, research assistants, or domain-specific agents.

Source: AI Haven