Text & LLMs
Large language models for chat, coding, reasoning and agents. Frontier models run in the cloud; open-weight models run on your own GPU — a 7–8B model fits in 8 GB, a 70B in 2×24 GB or one 48 GB card with quantisation.
Providers
The leading hosted services — sign up and use them via app or API.
| Provider | From | Strengths | Access |
|---|---|---|---|
| GPT-5 / o-series | OpenAI | General reasoning, tools, coding | API · app |
| Claude (Opus / Sonnet) | Anthropic | Coding, long-context, agents | API · app |
| Gemini | Multimodal, huge context | API · app | |
| Llama | Meta | Open weights, broad ecosystem | Open weights |
| Mistral / Magistral | Mistral AI | Efficient open & API models | Open · API |
| DeepSeek V4 | DeepSeek | Strong open reasoning, low cost | Open · API |
| Qwen | Alibaba | Open weights, many sizes | Open weights |
| Grok | xAI | Realtime, reasoning | API · app |
Open-source tools
Run these yourself on a local or rented GPU. Open weights are free to use, private, and finetunable.
Run LLMs in C/C++ on CPU or GPU; GGUF quantisation; the backbone of local inference.
vLLM
High-throughput serving engine with PagedAttention; the standard for production inference.
What you need to run it
See GPU prices to buy a card, hosting to rent one by the hour, and GPU programming to understand the libraries underneath. VRAM is the deciding factor — check each tool's model card for its memory needs.