Nexalexica Beta 1.0.0 is released 🎉 Try it now

Groq API Setup Guide

Getting Your Groq API Key

Step 1: Create a Groq Account

Visit console.groq.com
Click “Sign Up” to create an account
Verify your email and complete registration

Step 2: Access API Keys

Log into your Groq console
Navigate to the “API Keys” section
Click “Create API Key”

Step 3: Generate and Store Key

Name your API key for identification
Copy the generated key (starts with gsk_)
Store it securely - you won’t see it again

Available Groq Models

Llama Models (Meta)

llama-3.2-1b-preview - Ultra-fast 1B parameter model
llama-3.2-3b-preview - Balanced 3B parameter model
llama-3.2-11b-text-preview - Text-only 11B model
llama-3.2-90b-text-preview - Large 90B text model
llama-3.1-8b-instant - Fast 8B instruction model
llama-3.1-70b-versatile - Versatile large model

Mixtral Models (Mistral)

mixtral-8x7b-32768 - Mixture of experts model
mistral-7b-instruct-v0.1 - Instruction-tuned Mistral

Gemma Models (Google)

gemma-7b-it - Instruction-tuned Gemma
gemma2-9b-it - Latest Gemma generation

Key Features

Ultra-Fast Inference: Hardware-accelerated LPU™ chips
High Throughput: Up to 750+ tokens per second
Low Latency: Near real-time responses
Cost-Effective: Competitive pricing per token
OpenAI Compatible: Drop-in API replacement

Speed Performance

Groq’s LPU™ (Language Processing Unit) technology delivers:

Llama-3.2-1b: ~750+ tokens/second
Llama-3.1-8b: ~500+ tokens/second
Llama-3.1-70b: ~250+ tokens/second
Mixtral-8x7b: ~350+ tokens/second

Context Windows

Llama 3.2 models: 128K tokens
Llama 3.1 models: 128K tokens
Mixtral models: 32K tokens
Gemma models: Varies by version

Pricing

Competitive rates: Often lower than major providers
Pay per token: Input and output tokens charged separately
Free tier: Available for testing and development
Volume discounts: Available for high usage

Rate Limits

Free Tier

Requests per minute: 30
Requests per day: 14,400
Tokens per minute: 6,000

Paid Tiers

Higher rate limits based on usage plan
Enterprise options available
Custom limits for high-volume users

Use Cases

Real-time chat applications
Interactive AI assistants
Live content generation
Gaming and entertainment
Developer tools and IDEs

Security Tips

Never commit keys to version control
Rotate keys regularly
Monitor usage in Groq console
Use least-privilege access principles

Last updated on July 31, 2025

Google Gemini API Setup Guide LM Studio API Setup Guide