Key features
- C/C++ inference
- GGUF
- Quantization
- Local server
- CPU/GPU backends
llama.cpp enables local and cloud LLM inference with minimal setup, quantization, GPU backends, a CLI, and an OpenAI-compatible server.
C/C++ inference engine for running LLMs locally and in the cloud with broad CPU, GPU, and GGUF support.
Pricing
Free
Primary category
research
Publisher
ggml.org
Verification
Community listing
Published by ggml.org
llama.cpp is commonly used for Local inference, Edge AI, Open model serving.
llama.cpp is listed as free to use.
Review pricing, feature coverage, ratings, and similar tools on this page before visiting the product site.
Local AI app and developer runtime for running, chatting with, and serving open models privately.
Mistral's conversational AI workspace for chat, search, documents, Canvas, code interpreter, and custom agents.
Andi is a generative AI-powered search engine that provides direct answers instead of just links.
Revolutionize writing with AI-powered paraphrasing and plagiarism detection.
Waabi World is an autonomy-focused AI tool for simulation, training, or intelligent system design from Waabi.
Revolutionize search with AI: intuitive, efficient, customizable, secure.
Compare close alternatives to llama.cpp and discover the best fit for your workflow.
See all options in Best research AI Tools or browse the full AI Tools Directory.