Groq

4 models

Ultra-low-latency inference for open models on Groq's LPU hardware (OpenAI-compatible API).

API basehttps://api.groq.com/openai/v1

Env keyGROQ_API_KEY

llama-3.3-70b-versatile

Context

131K

Max output

33K

in: textout: texttoolsstreamingjson_mode

llama-3.1-8b-instant

Context

131K

Max output

in: textout: texttoolsstreamingjson_mode

gemma2-9b-it

Context

Max output

—

in: textout: texttoolsstreaming

deepseek-r1-distill-llama-70b

Context

131K

Max output

—

Reasoning-distilled Llama 70B served on Groq.

in: textout: textreasoningstreaming