Is FluentAI really free?

Yes! Local models are unlimited and completely free. If you want to use cloud providers like Claude, GPT-4, or Gemini, you just need your own API keys.

Does it work offline?

Yes. Local GGUF models run entirely on your device with no internet connection needed.

Which AI models are supported?

100+ models including Llama 3, Gemma 4 (E2B/E4B), DeepSeek, Mistral, Phi, and Qwen locally via GGUF, LiteRT, or MLX. Plus Claude, GPT-4, Gemini, and 200+ models via OpenRouter in the cloud. You can also connect any OpenAI-compatible server like LM Studio, vLLM, LocalAI, or Jan.

Absolutely. All local conversations stay on your device. No data is ever sent to our servers. We have zero telemetry, zero tracking, and zero data collection.

What platforms are supported?

Android is available now on Google Play, F-Droid, and IzzyOnDroid with NPU/GPU acceleration via LiteRT. macOS and iOS 18+ (A17 Pro+) support real MLX inference on Apple Silicon. Desktop (Windows, Linux) versions are also available.

Privacy-first AI · now with NPU + Apple Silicon acceleration

llama3gemma4deepseekmistral

Your AI, your device,
your silicon.

Chat with Llama, Gemma 4, DeepSeek, Mistral and 100+ more — accelerated by your phone's NPU, your Mac's MLX engine, or any GGUF runtime. Completely offline. Completely free.

Get on Google Play Download for Desktop

100+

AI Models

RuntimesNEW

NPU

+ GPU + CPUNEW

Zero

Data Collection

Free

Forever

What's new in v1.3

Models

Gemma 4 (E2B + E4B)

Apache 2.0, 128K context, bartowski GGUF. New SoTA local class. Gemma 4n with MTP speculative decoding — up to 2× faster on Android GPU.

Hardware

NPU on Snapdragon

QNN delegate via Play Feature Delivery. SoC-aware backend selection: QNN → GPU → CPU.

Platform

MLX on Apple Silicon

Real inference on macOS & iOS 18+ A17 Pro+. 1-bit quantisation — 7B in ~1.75 GB on Metal.

Agents

AI Agent Platform

On-device plan-and-execute agents with skills, schedules, and mobile tools. Your phone now serves /v1/chat/completions.

See it in action

Watch FluentAI

Full product demo and a 30-second quick tour — privacy-first AI running entirely on your device.

Full product demo

NPU · MLX · Agents · OpenAI-compat · HF browser

YouTube ↗

30-second tour

Quick overview · perfect for sharing

YouTube ↗

Why FluentAI?

The privacy-first AI agent platform that puts you in control

Privacy First

Your conversations never leave your device. No data collection, no tracking, no cloud required.

100+ AI Models

Run Llama, Gemma, DeepSeek, Mistral locally or connect to Claude, GPT-4, Gemini via cloud.

Voice Chat

Talk to AI naturally with 5 conversation modes — Normal, Interview, Learning, Storytelling, and Translation.

Completely Free

No $20/month subscriptions. Use powerful local models at zero cost, forever.

Knowledge Bases

Upload PDFs and documents to chat with your own data. On-device RAG with semantic search.

Tool Calling & MCP

Built-in tools for search, math, weather, and memory. Connect to GitHub, Slack, Notion via MCP.

Chat Organization

Folders, tags, pinning, branching, and search. Keep your conversations organized your way.

Export & Share

Export chats as text, Markdown, JSON, or even as audio podcasts. Share conversations anywhere.

Bring Your Own Model

Import any GGUF model or load directly from Hugging Face. Use any model you want — total freedom.

NEW

Multi-Runtime Engine

Same chat, three backends: GGUF, LiteRT, MLX. The app picks the fastest one for your device automatically.

NEW

NPU Acceleration

Snapdragon NPU via QNN delegate. 2–4× faster local inference on supported phones with lower battery drain.

NEW

Apple Silicon MLX

Native Metal-backed inference on M-series Macs and A17 Pro+ iPhones. No Rosetta. No fallback. 1-bit quant unlocks low-RAM devices.

NEW

OpenAI-Compatible Servers

Point at LM Studio, vLLM, LocalAI, Jan, or any /v1/chat/completions endpoint. Models auto-discover.

NEW

On-Device AI Agents

Plan-and-execute agents with task memory run entirely on-device. Schedule agents, use mobile tools — clipboard, calendar, contacts, files.

NEW

Hugging Face Browser

Search and filter 10,000+ GGUF models by runtime. Per-file download with memory-fitness badges so you don't OOM your phone.

NEW

Benchmark + MMLU-50

4-step wizard, MMLU-50 quality score, shareable PNG + Markdown result cards, filterable history. Decode-only tok/s for honest speed reporting.

Inference engines

One app. Three inference engines.

FluentAI automatically picks the fastest runtime for your hardware — GGUF on every device, LiteRT for Snapdragon NPU/GPU, and MLX for Apple Silicon.

FLM

FllamaRuntime

// GGUF · llama.cpp · everywhere

→Gemma 4 architecture backport (ISWA dual-cache, MoE 128 experts)
→KleidiAI v1.23.0 (SME2 + Q4_K paths)
→KV cache TQ4/TQ3 quantization
→16 KB page alignment for Android 15+

LRT

LiteRTRuntime

// Android · GPU / NPU · LiteRT-LM 0.10

→Snapdragon NPU via QNN delegate
→SoC-aware backend selection: QNN → GPU → CPU
→Play Feature Delivery — no bloat at install
→MTP speculative decoding — ~1.5–2× faster generation

MLX

MlxRuntime

// macOS · iOS 18+ A17 Pro+ · Apple Silicon

→Real Apple MLX inference on M-series + A17 Pro+
→1-bit quantisation — 7B models in ~1.75 GB
→Metal-native — no Rosetta, no fallback
→Multi-file parallel download from Hugging Face

Powerful Capabilities

More than just a chat app — FluentAI is a complete AI toolkit

Chat With Your Documents

Upload PDFs, text files, and documents to create knowledge bases. FluentAI uses RAG (Retrieval-Augmented Generation) to search and answer questions from your files — all processed on-device.

PDF SupportSemantic SearchOn-device RAG

Built-in Tools & MCP

FluentAI comes with built-in tools — calculator, web search, weather, date/time, and AI memory. Plus full Model Context Protocol (MCP) support to connect to GitHub, Slack, Notion, and 20+ other services.

Tool CallingMCP ProtocolWeb SearchAI Memory

Rich Content & Code

Beautiful syntax-highlighted code blocks, LaTeX math rendering, HTML/SVG previews, and full Markdown support. Perfect for developers, students, and researchers.

Syntax HighlightingLaTeX MathHTML Preview

Templates & AI Personas

Choose from built-in prompt templates or create your own. Set up custom AI personas with unique system prompts — from a coding assistant to a creative writing partner.

Custom PersonasPrompt TemplatesAuto-fill

Truly Cross-Platform

Available on Android today with iOS, Windows, macOS, Linux, and Web coming soon. Your AI assistant, on every device you own.

AndroidDesktopWebCross-sync

Works with your favourite models — and your favourite server

Run models locally on your device, connect to cloud providers, or point at any OpenAI-compatible server — your choice

Llama 3

On-device

NEW

Gemma 4 E2B / E4B

Google · Apache 2.0

On-device

DeepSeek

On-device

Mistral

On-device

Phi

On-device

Qwen

On-device

Claude

Anthropic

Cloud

GPT-4

OpenAI

Cloud

Gemini

Google

Cloud

OpenRouter

200+ models

Cloud

NEW

LM Studio · vLLM · LocalAI · Jan

Any /v1 endpoint

OpenAI-compat

Ollama

Local server

Infrastructure

See it in action

A beautiful, intuitive interface designed for seamless AI conversations

Your data stays on your device

FluentAI is built from the ground up with privacy as the foundation, not an afterthought

Zero Data Collection

No telemetry, no tracking, no analytics. Your conversations are yours alone.

Offline Capable

Run AI models entirely on your device. No internet connection needed.

Open Source

Audit the code yourself. Full transparency in how your data is handled.

How FluentAI compares

Hardware acceleration, BYO servers, and on-device privacy — the moats the cloud apps can't match

Feature	FluentAI	ChatGPT	Claude	Gemini
Price	Free (local models)	Free / $20/mo	Free / $20/mo	Free / $20/mo
Privacy	On-device, zero collection	Cloud, data used for training	Cloud-based	Cloud, data used for training
Offline Mode	✓	✗	✗	✗
Model Choice	100+ models	GPT-4 only	Claude only	Gemini only
Hardware AccelerationNEW	NPU + GPU + Metal + CPU	Cloud only	Cloud only	Cloud only
BYO Local ServerNEW	LM Studio · vLLM · LocalAI · Jan · Ollama	✗	✗	✗
BYO Model (GGUF / HF)NEW	✓	✗	✗	✗
Voice Chat	✓	Paid	✗	✓
Open Source	✓	✗	✗	✗

Frequently Asked Questions

Everything you need to know about FluentAI

Your AI, your device,your silicon.

Gemma 4 (E2B + E4B)

NPU on Snapdragon

MLX on Apple Silicon

AI Agent Platform

Watch FluentAI

Why FluentAI?

Privacy First

100+ AI Models

Voice Chat

Completely Free

Knowledge Bases

Tool Calling & MCP

Chat Organization

Export & Share

Bring Your Own Model

Multi-Runtime Engine

NPU Acceleration

Apple Silicon MLX

OpenAI-Compatible Servers

On-Device AI Agents

Hugging Face Browser

Benchmark + MMLU-50

One app. Three inference engines.

FllamaRuntime

LiteRTRuntime

MlxRuntime

Powerful Capabilities

Chat With Your Documents

Built-in Tools & MCP

Rich Content & Code

Templates & AI Personas

Truly Cross-Platform

Works with your favourite models — and your favourite server

Llama 3

Gemma 4 E2B / E4B

DeepSeek

Mistral

Phi

Qwen

Claude

GPT-4

Gemini

OpenRouter

LM Studio · vLLM · LocalAI · Jan

Ollama

See it in action

Your data stays on your device

Zero Data Collection

Offline Capable

Open Source

How FluentAI compares

Frequently Asked Questions

Is FluentAI really free?

Does it work offline?

Which AI models are supported?

Is my data private?

What platforms are supported?

How do I get started?

Your AI, your device,
your silicon.