Skip to Content
Nexalexica Beta 1.0.0 is released 🎉 Try it now
DocumentationAPILM Studio API Setup Guide

LM Studio API Setup Guide

Setting Up LM Studio

Step 1: Download and Install LM Studio

  1. Visit lmstudio.ai
  2. Download LM Studio for your operating system
  3. Install the application following the setup wizard

Step 2: Download Models

  1. Open LM Studio
  2. Go to the “Search” tab
  3. Browse and download models from Hugging Face
  4. Popular starting models: Llama 3.2, Mistral, Code Llama

Step 3: Start Local Server

  1. Go to the “Local Server” tab in LM Studio
  2. Select your downloaded model
  3. Click “Start Server”
  4. Server runs at http://localhost:1234 by default

Available Models (Through LM Studio)

Meta Llama Models

  • Llama-3.2-1B-Instruct - Compact instruction-tuned model
  • Llama-3.2-3B-Instruct - Balanced performance model
  • Llama-3.1-8B-Instruct - Standard 8B instruction model
  • Llama-3.1-70B-Instruct - Large high-capability model

Mistral Models

  • Mistral-7B-Instruct - Efficient 7B parameter model
  • Mixtral-8x7B-Instruct - Mixture of experts model
  • Mistral-Small-Instruct - Compact efficient model

Code-Specialized Models

  • CodeLlama-7B-Instruct - Code generation and completion
  • CodeLlama-13B-Instruct - Larger code model
  • StarCoder2-7B - Advanced code understanding

Chat-Optimized Models

  • Zephyr-7B-Beta - Fine-tuned for conversations
  • OpenChat-7B - Optimized chat model
  • Vicuna-7B - Conversational AI model

Quantized Versions

  • GGUF Q4_K_M - 4-bit quantization (recommended)
  • GGUF Q5_K_M - 5-bit quantization (better quality)
  • GGUF Q8_0 - 8-bit quantization (highest quality)

Key Features

  • User-Friendly GUI: Easy model management and chat interface
  • Local Execution: Complete privacy, no data sent externally
  • OpenAI-Compatible API: Drop-in replacement for OpenAI API
  • Model Discovery: Browse and download from Hugging Face
  • Hardware Optimization: Automatic GPU acceleration

API Configuration

  • Base URL: http://localhost:1234/v1
  • API Key: Not required for local server (use dummy key)
  • Compatible: Works with OpenAI client libraries

System Requirements

Minimum

  • RAM: 8GB
  • Storage: 10GB free space
  • CPU: Modern multi-core processor
  • RAM: 16GB+ (32GB for 70B models)
  • GPU: NVIDIA RTX series or Apple Silicon
  • Storage: SSD with 50GB+ free space

Performance Optimization

  • GPU Acceleration: Enable in LM Studio settings
  • Quantization: Use Q4 or Q5 for speed/memory balance
  • Context Length: Adjust based on your use case
  • Batch Size: Optimize for your hardware

Advanced Features

  • Custom System Prompts: Modify model behavior
  • Temperature Control: Adjust response creativity
  • Token Streaming: Real-time response generation
  • Model Comparison: Test multiple models side-by-side
Last updated on