Overview

Llama.cpp is a C/C++ implementation for running Llama models locally, providing high-performance inference on CPU and GPU.

Key Features

  • High Performance: Optimized C/C++ implementation
  • CPU/GPU Support: Runs on CPU or with GPU acceleration
  • Multiple Platforms: Windows, macOS, Linux, even mobile
  • Model Formats: GGML and GGUF format support
  • Quantization: Various quantization levels for different hardware

Installation

Pre-built Binaries

Download from releases

Build from Source

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make

Usage

Basic Chat

./main -m models/llama-7b.gguf --prompt "Hello, how are you?"

Interactive Mode

./main -m models/llama-7b.gguf -i

Server Mode

./server -m models/llama-7b.gguf

Supported Models

  • Llama 2/3 series
  • Code Llama
  • Other Llama-compatible models in GGUF format

0 items under this folder.