Hermes Registry

Groq

4 models

Ultra-low-latency inference for open models on Groq's LPU hardware (OpenAI-compatible API).

API basehttps://api.groq.com/openai/v1
Env keyGROQ_API_KEY

Llama 3.3 70B Versatile

llama-3.3-70b-versatile
Context
131K
Max output
33K
in: textout: texttoolsstreamingjson_mode

Llama 3.1 8B Instant

llama-3.1-8b-instant
Context
131K
Max output
8K
in: textout: texttoolsstreamingjson_mode

Gemma 2 9B

gemma2-9b-it
Context
8K
Max output
in: textout: texttoolsstreaming

DeepSeek R1 Distill Llama 70B

deepseek-r1-distill-llama-70b
Context
131K
Max output

Reasoning-distilled Llama 70B served on Groq.

in: textout: textreasoningstreaming