Overview

Free and high-quality local LLM tools and agents that run on your own hardware without requiring cloud services or API keys.

Tools & Frameworks

Ollama

  • Description: Easy-to-use local LLM runner supporting multiple models
  • Models: Llama 3, Mistral, Phi, CodeLlama, and more
  • Platform: Windows, macOS, Linux
  • License: Open source

Lm Studio

  • Description: Desktop app for running local LLMs with a user-friendly interface
  • Models: Support for GGUF format models
  • Platform: Windows, macOS, Linux
  • License: Free for personal use

Jan

  • Description: Open-source ChatGPT alternative that runs 100% offline
  • Models: Multiple model support including Llama, Mistral
  • Platform: Windows, macOS, Linux
  • License: Open source (AGPLv3)

Gpt4all

  • Description: Free-to-use, locally running, privacy-aware chatbot
  • Models: Multiple open-source models optimized for consumer hardware
  • Platform: Windows, macOS, Linux
  • License: Open source

Localai

  • Description: OpenAI-compatible API for local inference
  • Models: Support for various model formats (GGML, GGUF)
  • Platform: Cross-platform (via Docker)
  • License: Open source (MIT)

Llama.Cpp

  • Description: C/C++ implementation for running Llama models locally
  • Models: Llama family and compatible models
  • Platform: Cross-platform
  • License: Open source (MIT)

Text Generation Webui (oobabooga)

  • Description: Gradio web UI for running LLMs
  • Models: Wide model support including GPTQ, AWQ, GGUF
  • Platform: Cross-platform (Python)
  • License: Open source (AGPLv3)

For Coding

  • CodeLlama (7B, 13B, 34B)
  • DeepSeek Coder (6.7B, 33B)
  • Phind CodeLlama (34B)
  • WizardCoder

For General Purpose

  • Llama 3 (8B, 70B)
  • Mistral (7B)
  • Mixtral (8x7B)
  • Phi-3 (3.8B)

For Chat

  • Llama 3 Instruct
  • Mistral Instruct
  • Zephyr
  • OpenChat

Hardware Requirements

Minimum (7B models)

  • RAM: 8GB+
  • GPU: Optional (CPU inference possible)
  • Storage: 5-10GB per model
  • RAM: 16GB+
  • GPU: 8GB VRAM+ (NVIDIA recommended)
  • Storage: 10-30GB per model

Optimal (70B+ models)

  • RAM: 32GB+
  • GPU: 24GB VRAM+ or multiple GPUs
  • Storage: 40-80GB per model

Key Features

  • Privacy: All processing happens locally
  • No API Costs: Free to use after initial setup
  • Offline Capable: No internet connection required
  • Customizable: Full control over model selection and parameters
  • Open Source: Most tools are fully open source