Text & LLMs

Large language models for chat, coding, reasoning and agents. Frontier models run in the cloud; open-weight models run on your own GPU — a 7–8B model fits in 8 GB, a 70B in 2×24 GB or one 48 GB card with quantisation.

Providers

The leading hosted services — sign up and use them via app or API.

Provider	From	Strengths	Access
GPT-5 / o-series	OpenAI	General reasoning, tools, coding	API · app
Claude (Opus / Sonnet)	Anthropic	Coding, long-context, agents	API · app
Gemini	Google	Multimodal, huge context	API · app
Llama	Meta	Open weights, broad ecosystem	Open weights
Mistral / Magistral	Mistral AI	Efficient open & API models	Open · API
DeepSeek V4	DeepSeek	Strong open reasoning, low cost	Open · API
Qwen	Alibaba	Open weights, many sizes	Open weights
Grok	xAI	Realtime, reasoning	API · app

Open-source tools

Run these yourself on a local or rented GPU. Open weights are free to use, private, and finetunable.

llama.cpp

Run LLMs in C/C++ on CPU or GPU; GGUF quantisation; the backbone of local inference.

inferenceC++

Ollama

One-command local model runner with a clean API; wraps llama.cpp.

inferenceeasy

vLLM

High-throughput serving engine with PagedAttention; the standard for production inference.

servingfast

LM Studio

Desktop app to download and chat with local models, GPU-accelerated.

desktopeasy

Transformers

Hugging Face's library — thousands of models behind one Python API.

librarytraining

Unsloth

2× faster, lower-memory fine-tuning of Llama/Mistral/Qwen with QLoRA.

fine-tune

LLaMA-Factory

Unified fine-tuning UI/CLI for 100+ LLMs and VLMs.

fine-tune

MLX

Apple-silicon array framework for running and training models on Macs.

apple

AirLLM

Run 70B inference on a single 4 GB GPU via layered offloading.

low-VRAM

nanochat

Karpathy's minimal, hackable full-stack ChatGPT clone to learn from.

learn

What you need to run it

See GPU prices to buy a card, hosting to rent one by the hour, and GPU programming to understand the libraries underneath. VRAM is the deciding factor — check each tool's model card for its memory needs.