Key features
- C/C++ inference
- GGUF
- Quantization
- Local server
- CPU/GPU backends
llama.cpp enables local and cloud LLM inference with minimal setup, quantization, GPU backends, a CLI, and an OpenAI-compatible server.
C/C++ inference engine for running LLMs locally and in the cloud with broad CPU, GPU, and GGUF support.
Pricing
Free
Primary category
research
Publisher
ggml.org
Verification
Verified listing
Published by ggml.org
llama.cpp is commonly used for Local inference, Edge AI, Open model serving.
llama.cpp is listed as free to use.
Review pricing, feature coverage, ratings, and similar tools on this page before visiting the product site.
Generate Excel formulas from natural language effortlessly.
Transform codebases swiftly with AI-driven refactoring and security.
OpenAI's lightweight SDK for building agentic apps with tools, handoffs, guardrails, and tracing.
Create dynamic UIs from AI model outputs.
Instantly integrate AI to enhance user engagement and analytics.
Empowers agencies to create and offer customized AI-powered solutions to their clients.
Compare close alternatives to llama.cpp and discover the best fit for your workflow.
See all options in Best research AI Tools or browse the full AI Tools Directory.