Enjoy 50% off on all tiers U͟p͟g͟r͟a͟d͟e͟

MODELS · ONE SUBSCRIPTION, ALL OF THEM

Every model. One bill.

getvivix runs 228+ frontier AI models for video, image, and audio in a single studio. Pick any model to see what it does, or start free.

Video models · 81

Kling VIDEO 3.0 4K

4K multimodal video generation with native audio and richer visual detail

Kling VIDEO O3 4K

4K variant of Kling O3 with native audio for premium production

High speed cinematic text to video with synced audio

Cinematic LTX-2 Pro text and image to video generator

Multi-shot cinematic video generation with native audio, 20+ camera controls, and character consistency

Seedance 1.5 Pro

Native audio-visual cinematic AI video generation

Premium multimodal video generation with native audio and cinematic motion

Seedance 2.0 Fast

Speed-optimized Seedance 2.0 for rapid iteration with native audio

Seedance 2.0 Mini

The lightest, fastest, cheapest tier of Seedance 2.0 with native audio

High speed Google Veo 3.1 Fast text to video generation

Multimodal video generation with reference consistency, video editing, and native audio

High-quality audio-driven avatar video generation

Fast audio-driven avatar video generation

Gemini Omni Flash

Multimodal video generation and editing with native audio, references, and multi-turn control

Grok Imagine Video

AI video generation with synchronized audio from text and images

Grok Imagine Video 1.5

Higher-tier Grok image-to-video from a single starting frame — now up to 1080p

Alibaba text-to-video and image-to-video at 720p or 1080p with seeded generation and frame conditioning

Alibaba's upgraded multimodal video model — stronger motion, multi-image reference consistency, and better audio-visual sync at 720p or 1080p

HeyGen Avatar IV

AI talking avatar video from a HeyGen avatar or your own photo, driven by a script or audio

HeyGen Avatar V

Talking digital twins with sharper identity and motion coherence

HeyGen Video Agent

AI-powered prompt-to-video production with avatars, B-roll, and motion graphics

Kling VIDEO 2.6 Pro

Kling VIDEO 2.6 Pro is a full audio-visual AI video model that combines cinematic-quality video generation with native audio (dialogue, sound effects, ambience), with optional Motion Control for precise character movement via the API.

Kling VIDEO 3.0 Pro

High-fidelity multimodal video generation with native audio and advanced editing

Kling VIDEO 3.0 Standard

Multimodal video generation with native audio and efficient performance

Kling VIDEO 3.0 Turbo

Speed-optimized multimodal video with native audio, stronger prompt adherence, and improved lip sync

Kling VIDEO O3 Pro

Unified multimodal video generation with native audio and higher-fidelity renders

Kling VIDEO O3 Standard

Cost-efficient multimodal video generation with native audio and editing

KlingAI 1.6 Pro

High fidelity image to video model for dynamic 1080p clips

KlingAI 1.6 Standard

Mid-tier KlingAI 1.6 Standard text to video model

KlingAI 2.0 Master

KlingAI 2.0 Master for high control AI video generation

KlingAI 2.1 Master

Premium KlingAI 2.1 Master for high fidelity video

KlingAI 2.1 Pro

KlingAI 2.1 Pro for cinematic AI video generation

KlingAI 2.1 Standard

KlingAI 2.1 Standard for faster AI video generation

KlingAI 2.5 Turbo Pro

Cinematic text to video and image to video at scale

KlingAI 2.5 Turbo Standard

Fast cinematic image to video generation for creators

KlingAI Avatar 2.0 Pro

High fidelity avatar video generation with smoother motion and quality

KlingAI Avatar 2.0 Standard

Expressive avatar video generation from image and audio

KlingAI Lip-Sync

Accurate AI lip sync for character driven video content

Open-source AI video model with synchronized audio and high-fidelity output

Segmented AI video retakes with precise in-shot control

High-fidelity multimodal video generation with native audio

Fast multimodal video generation optimized for rapid iteration

MiniMax 01 Director

Cinematic text to video with precise camera control

MiniMax 01 Live

Anime video model for expressive character animation

MiniMax Hailuo 02

Cinematic AI video model for viral and commercial clips

MiniMax Hailuo 2.3

High fidelity AI video generation from text or images

MiniMax Hailuo 2.3 Fast

Fast MiniMax Hailuo 2.3 model for short cinematic video

Cognitive avatar video from image, audio, and text

Real-time AI video generation with draft mode and native audio

P-Video-Animate

Reference-image animation driven by the motion, timing, and camera movement of a source video

P-Video-Replace

Swap the on-camera character in a video using a reference image, preserving motion, timing, camera and scene

PixVerse LipSync

Realistic AI lip sync from audio for any video

PixVerse V3.5 early text to video effects model

PixVerse V4 AI text to video with pro camera control

PixVerse V4.5 cinematic text and image to video model

PixVerse V5 cinematic text to video and image to video

PixVerse V5 Fast

Fast text to video and image to video generation for rapid iteration

Enhanced cinematic video generation with improved lip-sync and audio realism

Runway Aleph 2.0

Localized video editing that transforms an existing clip from a text prompt while keeping the rest stable

Runway Gen-4 Turbo

High speed Gen-4 Turbo image to video generation

Advanced multimodal video generation with text and image input

Seedance 1.0 Pro

Seedance 1.0 Pro high fidelity 1080p text and image to video

Seedance 1.0 Pro Fast

Fast Seedance 1.0 Pro video generation for dance content

Multimodal video-audio foundation model with 1080p cinematic output, inpainting, and video extension

Next generation AI video and audio model from OpenAI

Premium Sora 2 Pro model for high fidelity AI video

Full-scene lip synchronization with global face understanding and obstruction handling

High fidelity text to video generation with camera control

Cinematic video generation, now with native audio

Fast Google Veo 3 video generation with native audio

Veo 3.1 cinematic AI video with native audio

Fast 1080p AI video generation with strong consistency

Vidu Q1 high fidelity reference to video generation model

High fidelity Vidu Q2 Pro model for cinematic AI video

Faster Vidu Q2 video generation with advanced motion control

Multimodal video generation with native audio and intelligent shot planning

Low-latency multimodal video generation with native audio

MoE video generation from text or images at 480p to 720p

Wan2.5-Preview AI Text to Video with Native Audio

Multimodal video generation with multi-shot and native sound

Fast distilled image-to-video generation model

Image models · 86

GPT Image 1.5 flagship image model with faster generation and enhanced editing

OpenAI GPT Image 2 — high-fidelity generation and editing with up to 16 reference images

Grok Imagine Image Pro

High fidelity AI image generation and editing with improved prompt control

Kling IMAGE 3.0

2K to 4K image generation with improved realism and practical image-to-image editing

4K Omni image generation with strong consistency and reference control

Gemini 3.1 Flash Image fast high quality AI image generation and editing

Seedream 5.0 Lite

Responsive text-to-image generation with real-time search and precise prompt adherence

Seedream 5.0 Pro

Flagship image model for controlled editing, multi-reference generation, and layer-aware visual workflows

The getvivix signature model — instant images in any style

Unified image generation and editing with avatar customization, color control, and multilingual text rendering

Wan2.7 Image Pro

Premium image generation with enhanced composition stability and precise prompt comprehension

Commercial-safe text to image model for production use

Deterministic JSON native text to image for enterprises

Instruction-driven image editing with mask support

Bria Fibo Edit Tools

Unified image editing foundation for recolor, relight, restore, blend, reseason, and sketch

DALL·E 2 AI image generator for text guided creation

DALL·E 3 high fidelity text to image generation API

Exactly Bold Chromatics

Vibrant, high-contrast illustrative style with bold color palettes

Exactly Bright Pulse

Bright, energetic photographic style with vivid lighting

Exactly Dark Comics

Dark, gritty comic art style with heavy shadows and noir aesthetics

Exactly Distant Reality

Dreamy photographic style with surreal, distant atmosphere

Exactly Earthy Elegance

Warm, organic illustrative style with muted earth tones

Exactly Editorial Line

Clean, editorial-style line illustrations with refined detail

Exactly Extreme Contrast

High-contrast photographic style with dramatic light and shadow

Exactly Grain Film Look

Analog film photography style with natural grain and warm tones

Exactly Graphic Harmony

Balanced, harmonious graphic illustrations with cohesive composition

Exactly Graphic Novel

Comic book and graphic novel style with strong ink lines and dramatic shading

Exactly Graphite Creature

Textured graphite-style illustrations with creature and character focus

Exactly Journey

Travel and adventure photographic style with rich, cinematic tones

Exactly Monochrome Café

Monochromatic illustrative style with warm café-inspired tones

Exactly Muted Modern

Contemporary illustrative style with soft, muted color palettes

Exactly Playful Line Adventures

Whimsical, playful line art with an adventurous character

Exactly Warm Light

Soft, warm-lit photographic style with inviting golden tones

FLUX Virtual Try-On

Low-latency virtual try-on for transferring garments onto a person image with strong identity and garment fidelity

black-forest-labs

Open-weight 12B text to image model for rich visuals

black-forest-labs

FLUX.1 [schnell]

Ultra fast FLUX.1 text to image model for local use

black-forest-labs

FLUX.1 Kontext [dev]

Open image editing model for fast iterative workflows

black-forest-labs

FLUX.1 Kontext [max]

High fidelity FLUX.1 Kontext max for precise image edits

black-forest-labs

FLUX.1 Kontext [pro]

Context aware FLUX.1 image editing and generation model

black-forest-labs

FLUX.1 Krea [dev]

FLUX.1 Krea Dev for photorealistic open‑weight generation

black-forest-labs

FLUX.1.1 Pro high fidelity text to image generation

black-forest-labs

FLUX.1.1 [pro] Ultra

High speed 4MP FLUX image generation for production apps

black-forest-labs

FLUX.2 dev for controllable open text to image workflows

black-forest-labs

Configurable FLUX.2 Flex for precise text aligned images

black-forest-labs

FLUX.2 [klein] 4B

Fastest Klein model for real-time image generation and editing

black-forest-labs

FLUX.2 [klein] 4B Base

Compact undistilled model for efficient image generation and editing

black-forest-labs

FLUX.2 [klein] 9B Base

Undistilled foundation model for high-quality image generation and editing

black-forest-labs

FLUX.2 [klein] 9B KV

KV-cache accelerated image generation and editing for real-time multi-reference workflows

black-forest-labs

The latest state-of-the-art model from Black Forest Labs, generating images grounded in live web information.

black-forest-labs

High control FLUX.2 Pro image generation and editing

black-forest-labs

GPT Image 1 high fidelity image generation for GPT-4o

Grok Imagine Image

AI image generation from text and images

Grok Imagine Image Quality

xAI's quality-focused image generation and editing — sharper realism, better text rendering, tighter prompt following

HiDream-I1 Dev fast 17B text to image generation model

HiDream-I1 Fast

HiDream-I1 Fast for low latency text to image generation

HiDream-I1 Full

HiDream-I1 Full high fidelity text to image generator

Ideogram 2.0 text to image model for sharp design work

Ideogram 3.0 text to image model for sharp design visuals

Design-focused text-to-image with strong typography, layout control, transparent backgrounds, and 2K output

High fidelity text to image generation with Imagen 3

High speed Imagen 3 Fast model for rapid image generation

High speed Imagen 4 Fast text to image generation

Imagen 4 Preview

High fidelity 2K text to image generation by Google

High fidelity text to image model with sharp typography

ImagineArt 1.5 Pro

Professional AI image generation with native 4K and refined visual control

Reasoning-based text to image generation with vibrant true-to-life color

Juggernaut Lightning Flux by RunDiffusion

Ultra fast Flux-based model for high volume image generation

Juggernaut Pro Flux by RunDiffusion

Photorealistic Flux based text to image model for pros

Kandinsky 5.0 Image Lite

Efficient text-to-image and image-to-image editing model

Larger Krea 2 variant for rawer, more flexible outputs with stronger photorealism and weighted reference control

Faster Krea 2 variant for stable, consistent generation with controllable prompt strength and weighted reference guidance

High quality multi image generation for complex visuals

Nano Banana 2 Lite

Lighter Nano Banana 2 image model for faster generation and editing workflows

Real-time text-to-image model for production graphics

High precision multi image AI editor for fast workflows

Qwen-Image high fidelity text aware image generation model

Unified image generation and editing with professional text rendering

Qwen‑Image‑Edit

High fidelity text guided image editing for Qwen

Professional text-to-image model for brand and marketing design

Advanced design-focused image generation with enhanced control and fidelity

High speed 4K AI image generation and editing model

Stable Diffusion 3

Stable Diffusion 3 for sharper text and complex images

Wan2.5-Preview Image

High fidelity Wan2.5 image generation for rich single frames

High fidelity image generation built on the Wan2.6 visual stack

Efficient high-quality image generation foundation model

Fast photorealistic image generator with text control

Audio models · 19

ACE-Step v1.5 Base

Open-source music generation with voice cloning, lyric editing, and multilingual support

ACE-Step v1.5 Turbo

Fast music generation optimized for speed with reduced inference steps

Eleven Flash v2

Low-latency English TTS for real-time voice use-cases

Eleven Flash v2.5

Real-time TTS for voice agents, 32 languages, ~75ms latency

Eleven Monolingual v1

Legacy English-only TTS

Eleven Multilingual v1

Legacy multilingual TTS across 9 languages

Eleven Multilingual v2

High-fidelity multilingual TTS across 29 languages

Eleven Music v1

Generate studio quality music tracks from text prompts

Eleven Turbo v2

Low-latency English TTS for production

Eleven Turbo v2.5

Fast multilingual TTS across 32 languages

Premium expressive TTS across 74 languages with audio tags

Gemini 3.1 Flash TTS

Expressive text-to-speech with audio tags, multi-speaker dialogue, and 70+ languages

Inworld TTS-1.5 Max

High-fidelity expressive text-to-speech with rich prosody and multilingual support

Inworld TTS-1.5 Mini

Low-latency expressive text-to-speech optimized for real-time apps

MiniMax Speech 2.8

High-quality text-to-speech with expressive, natural voice synthesis

Qwen3-TTS 1.7B Base

High-quality multilingual text-to-speech with voice cloning and ultra-low latency

Qwen3-TTS 1.7B CustomVoice

Text-to-speech with preset premium timbres and precise style control

Qwen3-TTS 1.7B VoiceDesign

Text-to-speech with voice creation from natural language descriptions

xAI Text-to-Speech

Expressive text-to-speech with five voices, speech tags, and multilingual support

Text models · 24

Anthropic's flagship Fable 5 — the most capable generally available Claude, 1M-token context, vision, tools, and extended thinking

Claude Haiku 4.5

Anthropic's fastest Claude — latency-optimized for agentic sub-tasks and high-volume work

Claude Opus 4.7

Anthropic's flagship — demanding coding, agent orchestration, multimodal reasoning

Claude Sonnet 4.6

Anthropic's daily-driver Sonnet — coding, agents, long-context reasoning, computer use

DeepSeek V4 Flash

Budget-tier reasoning LLM with 1M context window and 384K max output

Advanced multimodal text and reasoning model

Gemini 3.1 Flash Lite

Advanced multimodal text and reasoning model

Advanced multimodal text and reasoning model

Z.ai's affordable mid-range LLM — 200K context and 73.8% on SWE-bench

Z.ai's flagship LLM — premium reasoning, 200K context, JSON mode, agentic strength

Flagship reasoning LLM with 1M context, native computer use, and high factual accuracy

Efficient reasoning LLM with 400K context for coding assistants and subagent workflows

Ultra-low-latency LLM for high-volume classification, extraction, and lightweight automation

OpenAI's newest flagship LLM — deepest reasoning, computer-use, 1M+ context

Moonshot AI multimodal LLM with native image and video understanding, 262K context

LLaVA-1.6-Mistral-7B

Vision-language model for image understanding and captioning

State-of-the-art agentic coding and office-work model, optimized for speed and cost

Long‑context agentic coding and office productivity model for fast, reliable tool use

MiniMax M2.7 Highspeed

Faster throughput for agentic coding and tool‑driven automation

Open Age Detection

Facial age estimation model

OpenAI CLIP ViT-L/14

Vision encoder for text-image representation and similarity

Qwen2.5-VL-3B-Instruct

Instruction-tuned vision-language model for image and text understanding

Qwen2.5-VL-7B-Instruct

Instruction-tuned multimodal vision-language model

ViT Age Classifier

Vision transformer model for estimating age from facial images

Utility models · 13

Bria Image Increase Resolution

High quality 2x and 4x AI image upscaling by Bria

Bria Video Background Removal

Real time AI video background removal for clean layers

Bria Video Increase Resolution

Bria video upscaling for sharper high resolution output

Nano Banana Pro

Nano Banana Pro image preview for precise visual control

P-Image Upscale

AI-powered image upscaling up to 8 megapixels with detail and realism enhancement

High precision multi image editing for daily workflows

Riverflow 1.1 Mini

Fast cost efficient model for versatile image editing

Riverflow 1.1 Pro

High precision image editing for production pipelines

Riverflow 2 Preview Fast

Fast lightweight Riverflow 2 for product accurate images

Riverflow 2 Preview Max

High detail Riverflow 2 model for premium product renders

Riverflow 2 Preview Standard

Balanced Riverflow 2 image model for pro workflows

Riverflow 2.0 Fast

Fast production image generation with reference-based super resolution and font control

Seedream 4.5 high fidelity multi reference text to image model

3D models · 5

Hunyuan 3D 3.1 Pro

High-detail image- or text-to-3D — a textured 3D mesh from a photo or prompt

Hunyuan 3D 3.1 Rapid

Fast image-to-3D — turn a photo into a textured 3D mesh in seconds

Image- or text-to-3D — production-grade textured meshes

High-fidelity image-to-3D generative model with compact structured latents

Image- or text-to-3D — fast, clean game-ready meshes