Free-First LLM Outlet

This page is for one specific goal: build a single internal "electrical outlet" that prefers free providers, respects rate limits, and relaxes to weaker models when the best free capacity is exhausted.

The important distinction is not just "free vs paid". It is:

Type	Good backbone for an outlet?	Meaning
Recurring free tier	Yes	Resets daily/minutely and can power ongoing traffic.
Trial or evaluation tier	Sometimes	Still useful, but often low-capacity or explicitly non-production.
One-time signup credits	No	Fine for testing, bad as a permanent backbone.
Local/self-hosted	Yes	Free in API-spend terms, but you pay with your own hardware.

Verified providers that can actually help

These are the providers that are currently useful for a free-first text outlet in practice.

Provider	What is actually free	Fit for the outlet	tryAGI entry point
Google Gemini	Google exposes a Free tier for selected Gemini API models. Exact live quotas are model-specific and shown in AI Studio.	Primary if you accept Google free-tier data terms.	`tryAGI.Google.Gemini`
SambaNova	Free tier without a payment method. Docs publish concrete limits, for example 20 RPM, 20 RPD, and 200K tokens/day on listed free-tier models.	Primary. Strongest published OpenAI-compatible free contract.	`CustomProviders.SambaNova(key)`
Cerebras	Free access to all Cerebras-powered models. Model pages publish free-tier limits; example: 30 RPM, 60K input TPM, 1M daily tokens.	Primary. Very strong reasoning/coding candidate.	`CustomProviders.Cerebras(key)`
OpenRouter	Free `:free` models and the `openrouter/free` router. Official limits page currently documents 20 RPM and 50 free-model requests/day by default, or 1,000/day after buying at least $10 of credits.	Secondary. Excellent abstraction/fallback layer, but too capped to be the only backbone.	`CustomProviders.OpenRouter(key)`
GitHub Models	Included, rate-limited access to many models. The catalog API exposes model metadata including `rate_limit_tier`.	Secondary. Good overflow and experimentation capacity.	`CustomProviders.GitHubModels(token)`
NVIDIA Build / NIM	Many models are available under NVIDIA trial-service terms on `build.nvidia.com`. This is real free usage for evaluation, but NVIDIA does not publish one universal free quota table across all models.	Opportunistic. Useful extra capacity, not a clean contract.	`CustomProviders.Nvidia(key)`
Groq	Groq still offers a free API key and "get started for free", but the public docs do not currently publish one stable universal free-tier quota table.	Opportunistic. Excellent latency, but discover quotas from the console instead of hard-coding assumptions.	`CustomProviders.Groq(key)`
Cohere Trial	Trial keys are free, and Cohere documents that a trial key is limited to 40 API calls per minute.	Overflow only. Good for dev, weak as a main backbone.	`CustomProviders.Cohere(key)`
Ollama / LM Studio	Always free locally. No remote quota, but bounded by your own machine.	Hard floor. Best last fallback if local inference is acceptable.	`CustomProviders.Ollama()` / `CustomProviders.LmStudio()`

Providers that are not good backbone candidates

These can still be useful, but they should not be treated as the permanent base of a free-first outlet:

Provider group	Why not a backbone
OpenAI / Anthropic	Any free usage is temporary credit-style access, not a recurring public free tier.
xAI / Together / Fireworks / Nebius	Usually signup credits, promos, or payment-linked tiers. Good for overflow, not for a stable free outlet.
Perplexity API	The consumer product has free usage, but the API is not a real recurring free-tier backbone.
DeepSeek	Extremely cheap, but I could not verify a currently published recurring free API tier from official docs today.
Mistral Experiment	Attractive for experimentation, but public quota terms are not published cleanly enough to make it your automatic backbone without manual verification in the console.

Recommended quality ladder

If your goal is "use the smartest thing available, then relax when quotas are gone", this is the practical order:

Tier A: Best free quality first

Gemini free-tier flagship model if available for your account and region.
Cerebras flagship reasoning model.
SambaNova flagship reasoning model.
NVIDIA Build trial flagship model.

Use this tier for coding, hard reasoning, structured outputs, and agent steps that actually matter.

Tier B: Strong but more available

Gemini Flash / Flash-Lite.
Groq 70B-class chat model.
SambaNova 70B-class Llama.
GitHub Models high-tier chat model.

Use this tier for most conversational traffic, summarization, and lightweight tool orchestration.

Tier C: Cheap resilient fallback

OpenRouter openrouter/free.
Explicit OpenRouter :free model ids.
Local Ollama / LM Studio 8B to 14B model.

Use this tier when everything else is cooling down, daily limits are gone, or you need the outlet to stay alive at all costs.

Routing policy that actually works

The workspace already gives you provider factories through CustomProviders. What it does not give you yet is the policy layer. That layer should do the following:

Filter candidates by capability first. Only compare models that support what the request needs: tool calling, JSON/schema output, vision, long context, or streaming.
Prefer recurring-free providers over credit-based providers. One-time credits should sit below your recurring free tiers, not above them.
Degrade inside a provider before leaving it. Example: Gemini Pro -> Gemini Flash -> Gemini Flash-Lite is usually cleaner than immediately jumping to a totally different provider.
Treat 429 as quota exhaustion, not provider failure. Put the model/provider on cooldown until reset instead of marking it unhealthy.
Treat 5xx, timeouts, and transport failures as health failures. Trip a short circuit breaker and retry a different provider quickly.
Persist quota and health state outside process memory. If multiple workers share the same outlet, store cooldowns and quota state in Redis or SQLite so they do not stampede the same provider.
Keep separate ladders per workload class. Coding/reasoning, general chat, embeddings, and tool-heavy agent turns should not all share the same exact fallback order.
Reserve some best-tier capacity. Do not let trivial requests burn your entire Tier A budget.

Minimal outlet shape

public sealed record OutletCandidate(
    string Id,
    string ModelId,
    int QualityScore,
    bool SupportsTools,
    bool SupportsVision,
    bool IsRecurringFree,
    Func<IChatClient> CreateClient);

public interface IOutletStateStore
{
    bool IsCoolingDown(string candidateId, DateTimeOffset now);
    void Cooldown(string candidateId, DateTimeOffset until);
    void NoteSuccess(string candidateId, TimeSpan latency);
    void NoteFailure(string candidateId, Exception exception);
}

public sealed class FreeFirstOutlet
{
    public async Task<ChatResponse> SendAsync(
        IReadOnlyList<OutletCandidate> candidates,
        Func<OutletCandidate, Task<ChatResponse>> invoke,
        DateTimeOffset now,
        IOutletStateStore state)
    {
        foreach (var candidate in candidates
            .Where(x => !state.IsCoolingDown(x.Id, now))
            .OrderByDescending(x => x.QualityScore)
            .ThenByDescending(x => x.IsRecurringFree))
        {
            try
            {
                var response = await invoke(candidate).ConfigureAwait(false);
                state.NoteSuccess(candidate.Id, TimeSpan.Zero);
                return response;
            }
            catch (Exception ex) when (IsRateLimit(ex))
            {
                state.Cooldown(candidate.Id, now.AddMinutes(1));
            }
            catch (Exception ex)
            {
                state.NoteFailure(candidate.Id, ex);
                state.Cooldown(candidate.Id, now.AddSeconds(30));
            }
        }

        throw new InvalidOperationException("No provider is currently available.");
    }

    private static bool IsRateLimit(Exception ex) =>
        ex.Message.Contains("429", StringComparison.OrdinalIgnoreCase) ||
        ex.Message.Contains("rate limit", StringComparison.OrdinalIgnoreCase);
}

This is intentionally simple. The important part is the policy:

best model first
recurring free preferred
quota cooldowns tracked explicitly
cross-provider fallback after capability filtering

A practical default ladder for tryAGI

If you want a starting point today, use this:

tryAGI.Google.Gemini flagship free-tier model
CustomProviders.Cerebras(key) flagship reasoning model
CustomProviders.SambaNova(key) flagship reasoning model
CustomProviders.Groq(key) 70B-class model
CustomProviders.OpenRouter(key) with openrouter/free
CustomProviders.Ollama() local 14B/8B model

That gives you:

one high-quality non-OpenAI-compatible primary path
two strong OpenAI-compatible remote backups
one very fast opportunistic provider
one universal free-model router
one local last-resort floor