Nexalexica Beta 1.0.0 is released 🎉 Try it now

LM Studio API Setup Guide

Setting Up LM Studio

Step 1: Download and Install LM Studio

Visit lmstudio.ai
Download LM Studio for your operating system
Install the application following the setup wizard

Step 2: Download Models

Open LM Studio
Go to the “Search” tab
Browse and download models from Hugging Face
Popular starting models: Llama 3.2, Mistral, Code Llama

Step 3: Start Local Server

Go to the “Local Server” tab in LM Studio
Select your downloaded model
Click “Start Server”
Server runs at http://localhost:1234 by default

Available Models (Through LM Studio)

Meta Llama Models

Llama-3.2-1B-Instruct - Compact instruction-tuned model
Llama-3.2-3B-Instruct - Balanced performance model
Llama-3.1-8B-Instruct - Standard 8B instruction model
Llama-3.1-70B-Instruct - Large high-capability model

Mistral Models

Mistral-7B-Instruct - Efficient 7B parameter model
Mixtral-8x7B-Instruct - Mixture of experts model
Mistral-Small-Instruct - Compact efficient model

Code-Specialized Models

CodeLlama-7B-Instruct - Code generation and completion
CodeLlama-13B-Instruct - Larger code model
StarCoder2-7B - Advanced code understanding

Chat-Optimized Models

Zephyr-7B-Beta - Fine-tuned for conversations
OpenChat-7B - Optimized chat model
Vicuna-7B - Conversational AI model

Quantized Versions

GGUF Q4_K_M - 4-bit quantization (recommended)
GGUF Q5_K_M - 5-bit quantization (better quality)
GGUF Q8_0 - 8-bit quantization (highest quality)

Key Features

User-Friendly GUI: Easy model management and chat interface
Local Execution: Complete privacy, no data sent externally
OpenAI-Compatible API: Drop-in replacement for OpenAI API
Model Discovery: Browse and download from Hugging Face
Hardware Optimization: Automatic GPU acceleration

API Configuration

Base URL: http://localhost:1234/v1
API Key: Not required for local server (use dummy key)
Compatible: Works with OpenAI client libraries

System Requirements

Minimum

RAM: 8GB
Storage: 10GB free space
CPU: Modern multi-core processor

Recommended

RAM: 16GB+ (32GB for 70B models)
GPU: NVIDIA RTX series or Apple Silicon
Storage: SSD with 50GB+ free space

Performance Optimization

GPU Acceleration: Enable in LM Studio settings
Quantization: Use Q4 or Q5 for speed/memory balance
Context Length: Adjust based on your use case
Batch Size: Optimize for your hardware

Advanced Features

Custom System Prompts: Modify model behavior
Temperature Control: Adjust response creativity
Token Streaming: Real-time response generation
Model Comparison: Test multiple models side-by-side

Last updated on July 31, 2025

Groq API Setup Guide Ollama API Setup Guide