Free-First LLM Outlet
This page is for one specific goal: build a single internal "electrical outlet" that prefers free providers, respects rate limits, and relaxes to weaker models when the best free capacity is exhausted.
The important distinction is not just "free vs paid". It is:
| Type | Good backbone for an outlet? | Meaning |
|---|---|---|
| Recurring free tier | Yes | Resets daily/minutely and can power ongoing traffic. |
| Trial or evaluation tier | Sometimes | Still useful, but often low-capacity or explicitly non-production. |
| One-time signup credits | No | Fine for testing, bad as a permanent backbone. |
| Local/self-hosted | Yes | Free in API-spend terms, but you pay with your own hardware. |
Verified providers that can actually help
These are the providers that are currently useful for a free-first text outlet in practice.
| Provider | What is actually free | Fit for the outlet | tryAGI entry point |
|---|---|---|---|
| Google Gemini | Google exposes a Free tier for selected Gemini API models. Exact live quotas are model-specific and shown in AI Studio. | Primary if you accept Google free-tier data terms. | tryAGI.Google.Gemini |
| SambaNova | Free tier without a payment method. Docs publish concrete limits, for example 20 RPM, 20 RPD, and 200K tokens/day on listed free-tier models. | Primary. Strongest published OpenAI-compatible free contract. | CustomProviders.SambaNova(key) |
| Cerebras | Free access to all Cerebras-powered models. Model pages publish free-tier limits; example: 30 RPM, 60K input TPM, 1M daily tokens. | Primary. Very strong reasoning/coding candidate. | CustomProviders.Cerebras(key) |
| OpenRouter | Free :free models and the openrouter/free router. Official limits page currently documents 20 RPM and 50 free-model requests/day by default, or 1,000/day after buying at least $10 of credits. |
Secondary. Excellent abstraction/fallback layer, but too capped to be the only backbone. | CustomProviders.OpenRouter(key) |
| GitHub Models | Included, rate-limited access to many models. The catalog API exposes model metadata including rate_limit_tier. |
Secondary. Good overflow and experimentation capacity. | CustomProviders.GitHubModels(token) |
| NVIDIA Build / NIM | Many models are available under NVIDIA trial-service terms on build.nvidia.com. This is real free usage for evaluation, but NVIDIA does not publish one universal free quota table across all models. |
Opportunistic. Useful extra capacity, not a clean contract. | CustomProviders.Nvidia(key) |
| Groq | Groq still offers a free API key and "get started for free", but the public docs do not currently publish one stable universal free-tier quota table. | Opportunistic. Excellent latency, but discover quotas from the console instead of hard-coding assumptions. | CustomProviders.Groq(key) |
| Cohere Trial | Trial keys are free, and Cohere documents that a trial key is limited to 40 API calls per minute. | Overflow only. Good for dev, weak as a main backbone. | CustomProviders.Cohere(key) |
| Ollama / LM Studio | Always free locally. No remote quota, but bounded by your own machine. | Hard floor. Best last fallback if local inference is acceptable. | CustomProviders.Ollama() / CustomProviders.LmStudio() |
Providers that are not good backbone candidates
These can still be useful, but they should not be treated as the permanent base of a free-first outlet:
| Provider group | Why not a backbone |
|---|---|
| OpenAI / Anthropic | Any free usage is temporary credit-style access, not a recurring public free tier. |
| xAI / Together / Fireworks / Nebius | Usually signup credits, promos, or payment-linked tiers. Good for overflow, not for a stable free outlet. |
| Perplexity API | The consumer product has free usage, but the API is not a real recurring free-tier backbone. |
| DeepSeek | Extremely cheap, but I could not verify a currently published recurring free API tier from official docs today. |
| Mistral Experiment | Attractive for experimentation, but public quota terms are not published cleanly enough to make it your automatic backbone without manual verification in the console. |
Recommended quality ladder
If your goal is "use the smartest thing available, then relax when quotas are gone", this is the practical order:
Tier A: Best free quality first
- Gemini free-tier flagship model if available for your account and region.
- Cerebras flagship reasoning model.
- SambaNova flagship reasoning model.
- NVIDIA Build trial flagship model.
Use this tier for coding, hard reasoning, structured outputs, and agent steps that actually matter.
Tier B: Strong but more available
- Gemini Flash / Flash-Lite.
- Groq 70B-class chat model.
- SambaNova 70B-class Llama.
- GitHub Models high-tier chat model.
Use this tier for most conversational traffic, summarization, and lightweight tool orchestration.
Tier C: Cheap resilient fallback
- OpenRouter
openrouter/free. - Explicit OpenRouter
:freemodel ids. - Local Ollama / LM Studio 8B to 14B model.
Use this tier when everything else is cooling down, daily limits are gone, or you need the outlet to stay alive at all costs.
Routing policy that actually works
The workspace already gives you provider factories through CustomProviders. What it does not give you yet is the policy layer. That layer should do the following:
-
Filter candidates by capability first. Only compare models that support what the request needs: tool calling, JSON/schema output, vision, long context, or streaming.
-
Prefer recurring-free providers over credit-based providers. One-time credits should sit below your recurring free tiers, not above them.
-
Degrade inside a provider before leaving it. Example: Gemini Pro -> Gemini Flash -> Gemini Flash-Lite is usually cleaner than immediately jumping to a totally different provider.
-
Treat
429as quota exhaustion, not provider failure. Put the model/provider on cooldown until reset instead of marking it unhealthy. -
Treat
5xx, timeouts, and transport failures as health failures. Trip a short circuit breaker and retry a different provider quickly. -
Persist quota and health state outside process memory. If multiple workers share the same outlet, store cooldowns and quota state in Redis or SQLite so they do not stampede the same provider.
-
Keep separate ladders per workload class. Coding/reasoning, general chat, embeddings, and tool-heavy agent turns should not all share the same exact fallback order.
-
Reserve some best-tier capacity. Do not let trivial requests burn your entire Tier A budget.
Minimal outlet shape
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | |
This is intentionally simple. The important part is the policy:
- best model first
- recurring free preferred
- quota cooldowns tracked explicitly
- cross-provider fallback after capability filtering
A practical default ladder for tryAGI
If you want a starting point today, use this:
tryAGI.Google.Geminiflagship free-tier modelCustomProviders.Cerebras(key)flagship reasoning modelCustomProviders.SambaNova(key)flagship reasoning modelCustomProviders.Groq(key)70B-class modelCustomProviders.OpenRouter(key)withopenrouter/freeCustomProviders.Ollama()local 14B/8B model
That gives you:
- one high-quality non-OpenAI-compatible primary path
- two strong OpenAI-compatible remote backups
- one very fast opportunistic provider
- one universal free-model router
- one local last-resort floor
Official references
- Gemini API pricing
- Gemini API rate limits
- SambaNova model rate limits
- Cerebras pricing
- Cerebras example model page with free-tier limits
- OpenRouter Free Models Router
- OpenRouter limits
- GitHub Models marketplace
- GitHub Models catalog API
- NVIDIA Build model card example
- Groq pricing
- Cohere error docs with trial-key limit note