// about Groq
Groq delivers the fastest publicly available LLM inference on the market, running Llama, Mixtral, Gemma, and other open models at speeds exceeding 500 tokens per second — typically 10–20× faster than GPU-based cloud providers. This is achieved through Groq's custom Language Processing Unit (LPU) chips, designed specifically for sequential token generation. Developers use Groq's API as a drop-in OpenAI replacement wherever response latency matters most, from real-time voice applications to low-latency agentic pipelines.