⏰ Limited-Time Free Models

Three free-tier models are active on OpenRouter. Z.ai GLM 4.5 (expires June 19, $0.60/$2.20 per M tokens, 131K context) and Meta Llama 3 70B Instruct (expires June 19, $0.51/$0.74 per M tokens) each have 8 days remaining. Anthropic Claude Opus 4.6 Fast (expires June 29, $30/$150 per M tokens, 1M context) leads the pack for long-context workloads with 16 days left.

πŸ†• New on OpenRouter

Anthropic continues to push long context: Claude Fable Latest and Claude Fable 5 both landed with 1M context windows at $10 per M tokens for prompts β€” a price point that makes million-token experiments viable for individual developers. Nex AGI also released Nex-N2-Pro (free) with 262K context for prompt input, the first 200K-class free model to appear in weeks.

The hub skews toward research and fine-tunes today. Highlights:

  • OpenTransformer/AGILLM-4.3 β€” A mixture-of-experts transformer with diffusion-block and PyTorch tag, the only top-10 upload combining MoE and diffusion architectures in a single model.
  • rrvaswin/qwen3_4b_instruct_icrl_run5_ckpt1320 β€” An instruction-tuning checkpoint from an in-context RL run on Qwen3 4B, useful for researchers studying ICRL.
  • Occupying-Mars/glm42-bfcl-native-36pct-artifacts β€” Native-function-calling artifacts for the Berkeley Function-Calling Leaderboard (BFCL), a release that improves tool-use evaluation.
  • gstojanovski/esm2_t6_8M-finetuned-AMP-classifier β€” A small ESM2 protein-language model fine-tuned for antimicrobial-peptide classification, one of the few biology-adjacent uploads of the day.

The remaining entries are scattered experiments: chess RL pre-to-post ablation runs from chess-pre-to-post (50M, 200M, 680M parameter sweeps), a bark-cpp GGML proof-of-concept, and a Qwen3 quantize-and-deploy demo. Most have zero downloads β€” research noise rather than viral hits.

  • GordenSun/GordenSuperPPTSkills β˜…775 β€” AI PPT generator: produce image-format slides via GPT and convert to fully editable PPTX. Billed as the “PPT track terminator.”
  • JimLiu/baoyu-design β˜…759 β€” Run Claude Design locally as an Agent Skill β€” works with Cursor, Claude Code, and more β€” for polished UI mockups.
  • apple/coreai-models β˜…694 β€” Apple’s first-party repo: model export recipes, Python primitives, and Swift runtime utilities for on-device AI.
  • amElnagdy/guard-skills β˜…565 β€” Quality gates for AI coding agents: catches common AI-generated failure modes in code, tests, and PRs before merge.
  • xiaohuailabs/xiaohu-video-translate β˜…461 β€” Local AI video translator: one sentence triggers download/transcribe/translate/polish/burn-in for subtitles β€” zero API fees.

Long context goes mainstream. With Claude Fable offering 1M context at $10/M tokens and Claude Opus 4.6 Fast offering the same at $30/M for free, the cost barrier for million-token experiments has collapsed. Combined with the steady stream of 200K+ free models (Nex-N2-Pro, GLM 4.5), developers now have more long-context runway than ever.

On-device AI gets first-party tooling. Apple’s release of coreai-models (export recipes + Swift runtime) is the first major signal that a tier-1 hyperscaler is treating the on-device model deployment stack as a product surface, not a research demo. Expect a wave of consumer apps to follow once the Swift runtime stabilizes.

The agent quality-gate gap is being filled. guard-skills and baoyu-design represent two sides of the same trend: developers no longer trust vanilla AI agent output, so they’re building skill layers that enforce quality. Guard skills catch failure modes; design skills produce better raw output. Both are signs that the agent ecosystem is maturing from “ask and hope” to “ask and verify.”

API-free local AI is the new meme. xiaohu-video-translate joins a growing cohort of repos promising “zero API fees” by routing everything through local models. As commercial inference prices drop, the value of “free and local” is shifting from cost to privacy and offline capability.