What You Need Before Starting
You'll need Python 3.11 or newer installed on your machine. Download Ollama — it runs open-source LLMs like Llama 3.2 and Mistral entirely offline on your computer. No GPU required for smaller models, though a GPU speeds up inference noticeably.
Install the required libraries:
pip install langchain langchain-community langchain-ollama python-dotenv- For document search:
pip install faiss-cpu pypdf
Step 1: Set Up Your Local LLM
Pull a model and start the Ollama server:
ollama pull llama3.2
ollama serve
Now create your agent file and initialize the model in Python:
from langchain_ollama import OllamaLLM
llm = OllamaLLM(
model="llama3.2",
temperature=0.2,
num_predict=1000
)
Low temperature gives more consistent, accurate responses — important for an agent that needs to reason through tasks.
Step 2: Add Tools for Real Actions
Agents need capabilities beyond chatting. Define tools using LangChain's @tool decorator:
from langchain.tools import tool
@tool
def calculator(expression: str) -> str:
"""Evaluate math expressions."""
try:
return str(eval(expression))
except:
return "Invalid expression."
@tool
def search_files(query: str) -> str:
"""Search local files in current directory."""
import glob
files = glob.glob(f"*{query}*")
return f"Found files: {files}" if files else "No matches."
tools = [calculator, search_files]
You can extend this with file readers, API calls, or any Python function.
Step 3: Build the Agent Loop
Set up the prompt template and memory, then create the agent:
from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.memory import ConversationBufferMemory
system_prompt = """You are a helpful local AI agent. Use tools only when needed.
Respond concisely. You have access to calculator and file search."""
prompt = ChatPromptTemplate.from_messages([
("system", system_prompt),
MessagesPlaceholder(variable_name="chat_history"),
("user", "{input}"),
MessagesPlaceholder(variable_name="agent_scratchpad"),
])
memory = ConversationBufferMemory(return_messages=True, memory_key="chat_history")
agent = create_tool_calling_agent(llm, tools, prompt)
agent_executor = AgentExecutor(
agent=agent,
tools=tools,
memory=memory,
verbose=True,
handle_parsing_errors=True
)
Run it:
response = agent_executor.invoke({
"input": "What's 15*23? Then search for 'tutorial' in files."
})
print(response['output'])
The agent reasons step-by-step: it recognizes it needs math, calls the calculator tool, then proceeds to file search.
Step 4: Add Document Search with RAG
For grounding responses in your own documents, add retrieval-augmented generation:
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain_ollama import OllamaEmbeddings
loader = PyPDFLoader("your_document.pdf")
docs = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=1000)
splits = splitter.split_documents(docs)
embeddings = OllamaEmbeddings(model="llama3.2")
vectorstore = FAISS.from_documents(splits, embeddings)
retriever = vectorstore.as_retriever()
@tool
def rag_search(query: str) -> str:
"""Search local documents for relevant information."""
docs = retriever.get_relevant_documents(query)
return "\n".join([doc.page_content for doc in docs[:3]])
tools.append(rag_search)
Rebuild the agent with this new tool. Now it can answer questions about your documents.
Deployment and Next Steps
To run: python your_agent_file.py
From here, you can:
- Add guardrails — refine prompts to prevent unwanted behavior
- Scale with LangGraph — build multi-agent workflows
- Wrap in FastAPI — create an API endpoint for other applications
- Add persistent memory — use SQLite for long-term context
This setup keeps everything local — no API costs, no data leaves your machine. It's a foundation you can expand into complex automation, research assistants, or domain-specific agents.