LM Studio API Setup Guide
Setting Up LM Studio
Step 1: Download and Install LM Studio
- Visit lmstudio.ai
- Download LM Studio for your operating system
- Install the application following the setup wizard
Step 2: Download Models
- Open LM Studio
- Go to the “Search” tab
- Browse and download models from Hugging Face
- Popular starting models: Llama 3.2, Mistral, Code Llama
Step 3: Start Local Server
- Go to the “Local Server” tab in LM Studio
- Select your downloaded model
- Click “Start Server”
- Server runs at
http://localhost:1234
by default
Available Models (Through LM Studio)
Meta Llama Models
- Llama-3.2-1B-Instruct - Compact instruction-tuned model
- Llama-3.2-3B-Instruct - Balanced performance model
- Llama-3.1-8B-Instruct - Standard 8B instruction model
- Llama-3.1-70B-Instruct - Large high-capability model
Mistral Models
- Mistral-7B-Instruct - Efficient 7B parameter model
- Mixtral-8x7B-Instruct - Mixture of experts model
- Mistral-Small-Instruct - Compact efficient model
Code-Specialized Models
- CodeLlama-7B-Instruct - Code generation and completion
- CodeLlama-13B-Instruct - Larger code model
- StarCoder2-7B - Advanced code understanding
Chat-Optimized Models
- Zephyr-7B-Beta - Fine-tuned for conversations
- OpenChat-7B - Optimized chat model
- Vicuna-7B - Conversational AI model
Quantized Versions
- GGUF Q4_K_M - 4-bit quantization (recommended)
- GGUF Q5_K_M - 5-bit quantization (better quality)
- GGUF Q8_0 - 8-bit quantization (highest quality)
Key Features
- User-Friendly GUI: Easy model management and chat interface
- Local Execution: Complete privacy, no data sent externally
- OpenAI-Compatible API: Drop-in replacement for OpenAI API
- Model Discovery: Browse and download from Hugging Face
- Hardware Optimization: Automatic GPU acceleration
API Configuration
- Base URL:
http://localhost:1234/v1
- API Key: Not required for local server (use dummy key)
- Compatible: Works with OpenAI client libraries
System Requirements
Minimum
- RAM: 8GB
- Storage: 10GB free space
- CPU: Modern multi-core processor
Recommended
- RAM: 16GB+ (32GB for 70B models)
- GPU: NVIDIA RTX series or Apple Silicon
- Storage: SSD with 50GB+ free space
Performance Optimization
- GPU Acceleration: Enable in LM Studio settings
- Quantization: Use Q4 or Q5 for speed/memory balance
- Context Length: Adjust based on your use case
- Batch Size: Optimize for your hardware
Advanced Features
- Custom System Prompts: Modify model behavior
- Temperature Control: Adjust response creativity
- Token Streaming: Real-time response generation
- Model Comparison: Test multiple models side-by-side
Last updated on