Wednesday, February 04, 2026

OpenClaw Smart AI Model routing to reduce cost

Stop Wasting API Calls:
A Practical Guide to Multi-Tier AI Model Systems

How to run 24/7 AI automation without burning through your quota or your budget
⚡ Want the detailed technical implementation?
This post covers the concepts and benefits. For the full technical guide with code examples, configuration files, and monitoring setup, scroll down to the Complete Implementation Guide section below.

The Problem: Using Premium Models for Everything

If you're running AI automation—whether it's a personal assistant, business monitoring, or development tools—you've probably noticed your API usage climbing fast.

Here's what most people do wrong: they pick their favorite model and use it for everything.

  • "I like Claude Sonnet, so that's what I use"
  • "GPT-4 is the best, I'll just use that"
  • "Gemini Pro is good enough for most things"
This approach has two big problems:
  1. You burn through API quotas unnecessarily
  2. You pay more than you need to

The reality is that most AI tasks don't require premium models. You're using a sledgehammer to crack walnuts.

The Solution: Match Model Cost to Task Complexity

Think of AI models like tools in a workshop. You wouldn't use a precision laser cutter to cut plywood. You use:

  • Cheap tools for simple, repetitive tasks
  • Balanced tools for everyday work
  • Premium tools only when you actually need the capability

The same logic applies to AI models.

The 3-Tier System

💵 Tier 1: Background Workers
(Cheap & Fast)

Models: Gemini Flash, Claude Haiku, DeepSeek V3
Cost: $0.10 - $0.50 per million tokens
Speed: Very fast responses

Use for:

  • Scheduled tasks and cron jobs
  • File operations (move, copy, rename, organize)
  • Simple monitoring (is X up? did Y complete?)
  • Data extraction from logs or files
  • Basic yes/no questions
  • Heartbeat checks
Example:
Task: Check if server 192.168.1.100 responds to ping
Tier 1 Response: "Server is up. Response: 12ms"
Cost: ~$0.0001

Why it works: These tasks don't require reasoning, creativity, or complex understanding. They're simple data retrieval or yes/no answers. Cheap models handle them perfectly.

🧠 Tier 2: Daily Driver
(Balanced)

Models: Claude Sonnet, Gemini Pro, GPT-4o
Cost: $3 - $15 per million tokens
Speed: Good balance

Use for:

  • All normal chat conversations (your default)
  • Code writing and review
  • Research and analysis
  • Documentation
  • Email composition
  • Most technical troubleshooting
  • Content creation
Example:
Task: Summarize 50 emails and prioritize urgent ones
Tier 2 Response: Detailed summary with context and priorities
Cost: ~$0.05-0.10

Why it works: Tier 2 models are smart enough for 90% of what you'll throw at them. They understand context, can reason through problems, and write quality content. This should be your default for anything interactive.

🚀 Tier 3: The Heavy Hitters
(Premium)

Models: Claude Opus, GPT-4 (full)
Cost: $15 - $75 per million tokens
Speed: Slower, but most capable

Use for:

  • Complex architecture decisions
  • Multi-step reasoning with many variables
  • Novel problem-solving (no clear solution path)
  • When Tier 2 has tried and failed multiple times
  • High-stakes content (legal, financial, critical business decisions)
Example:
Task: Debug a subtle async race condition in distributed system
After: Tier 2 tried 3 approaches and failed
Tier 3 Response: Identified timing issue with detailed trace
Cost: ~$2-5
Worth it: Saved 4-6 hours of manual debugging

Why it works: Tier 3 models have the best reasoning capabilities. But you only need that extra power occasionally. Use it strategically, not by default.

Real Example: Daily Automation Workflow

Here's how a typical day breaks down:

6:00 AM - Morning Checks (Tier 1)

  • Check server status: 5 servers
  • Count unread emails
  • Review calendar
  • Check backup completion
Cost: ~$0.002 total

Throughout Day - Interactive Work (Tier 2)

  • 10 chat conversations
  • 3 code reviews
  • 2 email summaries
  • 1 documentation update
Cost: ~$0.40 total

Occasionally - Complex Problem (Tier 3)

  • Maybe once a week
  • Usually after Tier 2 can't solve it
Cost: ~$2-5 per use
Monthly Total: $15-25 depending on heavy problem-solving needs

Common Mistakes to Avoid

Mistake Problem Solution
Using Premium Models by Default Burns through quota and budget Set Tier 2 as your default, escalate only when needed
Using Cheap Models for Complex Tasks Wastes time with poor results If unsure, start with Tier 2. Downgrade later if it's overkill.
Not Tracking Usage Can't identify what's expensive Log every call. Review weekly.
Manual Model Switching You'll forget or choose wrong Automate tier selection based on task type

Quick Reference Template

Copy this into your system prompt:

You are a cost-efficient AI assistant using a tiered model system:

TIER 1 (Background): Gemini Flash, Claude Haiku
- Cost: $0.10-0.50 per 1M tokens
- Use for: Scheduled tasks, monitoring, file ops, simple queries
- Auto-select for all background work

TIER 2 (Default): Claude Sonnet, Gemini Pro
- Cost: $3-15 per 1M tokens  
- Use for: Chat, code, research, analysis
- Your default for all interactive work

TIER 3 (Premium): Claude Opus, GPT-4
- Cost: $15-75 per 1M tokens
- Use for: Complex reasoning, after Tier 2 fails
- ALWAYS ask permission before using

RULES:
1. Background/automated = Tier 1 (automatic)
2. Interactive/chat = Tier 2 (default)
3. Complex/failed attempts = Tier 3 (ask first)
4. Log all usage
5. Alert at 80% monthly budget

Conclusion

The goal isn't to use the cheapest model possible. It's to match model capability to task complexity.

  • Simple tasks → Simple models
  • Normal work → Balanced models
  • Hard problems → Premium models
This approach:
  • Reduces API quota usage by 60-80%
  • Lowers costs significantly
  • Maintains quality where it matters
  • Reserves premium models for when you actually need them

Start simple:

  1. Move background tasks to Tier 1
  2. Keep Tier 2 as your default
  3. Use Tier 3 strategically

Ready to implement this yourself? Keep reading for the complete technical guide.

Complete Implementation Guide:
Building Your Tiered AI System

The technical details, configuration examples, and monitoring setup that powers my always-on AI assistant

The Four-Tier Architecture

My production system actually uses four tiers, not three. The fourth tier adds fallback redundancy that prevents failures when primary services have issues.

┌─────────────────────────────────────────────────────────┐ │ TIER 1: Gemini Flash (Primary) │ │ Cost: $0.10/M input, $0.40/M output │ │ Use: 95% of all requests │ │ Handles: Summaries, Q&A, simple code, background tasks │ └─────────────────────────────────────────────────────────┘ │ ▼ (rate limited or down) ┌─────────────────────────────────────────────────────────┐ │ TIER 2: OpenRouter Bridge │ │ Cost: $0.075-$0.30/M (varies by model) │ │ Use: Fallback when Gemini fails │ │ Models: Gemini Flash Lite, DeepSeek V3 │ └─────────────────────────────────────────────────────────┘ │ ▼ (still failing) ┌─────────────────────────────────────────────────────────┐ │ TIER 3: Claude Haiku │ │ Cost: $0.80/M input, $4/M output │ │ Use: Emergency fallback only │ │ When: Both Gemini and OpenRouter unavailable │ └─────────────────────────────────────────────────────────┘ │ ▼ (explicit user request) ┌─────────────────────────────────────────────────────────┐ │ TIER 4: Claude Sonnet/Opus │ │ Cost: $3-15/M input, $15-75/M output │ │ Use: Complex reasoning, architecture, code review │ │ When: User explicitly invokes or task requires it │ └─────────────────────────────────────────────────────────┘

Tier 1: Gemini Flash (The Workhorse)

Model: gemini-2.0-flash
Cost: $0.10/M input, $0.40/M output
Context Window: 1,000,000 tokens
Rate Limits (Paid Tier 1): 2,000 RPM, 4M tokens/minute

Handles 95% of all requests:

  • Cron jobs - Daily summaries, health checks, weather alerts
  • Heartbeat tasks - Keeping the assistant "warm" every 2 hours
  • Simple queries - "What time is it in Tokyo?" doesn't need Opus
  • Text processing - Summarization, formatting, extraction
  • Background workers - Tasks that run while you sleep

Why Gemini Flash?

  1. Massive context window - 1M tokens means it can ingest entire codebases
  2. Speed - Flash is fast, responses in under a second
  3. Cost - At $0.10/M input, you can process 10 million tokens for a dollar
  4. Google's free tier - 15 requests/minute free, 1,500/day (but paid tier recommended for reliability)

Configuration Example

{
  "google": {
    "baseUrl": "https://generativelanguage.googleapis.com/v1beta",
    "apiKey": "YOUR_API_KEY",
    "models": [
      {
        "id": "gemini-2.0-flash",
        "name": "Gemini 2.0 Flash",
        "cost": { "input": 0.1, "output": 0.4 },
        "contextWindow": 1000000,
        "maxTokens": 8192
      }
    ]
  }
}

Tier 2: OpenRouter (The Safety Net)

What is OpenRouter? A unified API gateway that routes to 100+ models from different providers. One API key, access to everything.

Why use it as a fallback?
When Gemini hits rate limits or goes down (it happens), OpenRouter provides instant failover to alternative models.

The Fallback Chain

{
  "model": {
    "primary": "google/gemini-2.0-flash",
    "fallbacks": [
      "openrouter/google/gemini-2.5-flash-lite",
      "openrouter/deepseek/deepseek-chat-v3-0324",
      "anthropic/claude-haiku-4"
    ]
  }
}

When a request fails:

  1. Try Gemini Flash directly → 429 rate limit
  2. Try Gemini Flash Lite via OpenRouter → works, costs $0.075/M
  3. If that fails, try DeepSeek V3 → works, costs $0.14/M
  4. Last resort: Claude Haiku → works, costs $0.80/M

The system automatically retries down the chain. User never sees an error.

Tier 3: Claude Haiku (Emergency Fallback)

Model: claude-haiku-4
Cost: $0.80/M input, $4/M output
When used: Only when Tiers 1 and 2 both fail

Haiku is the "never fail" option. Anthropic's infrastructure is rock solid. If Gemini is down AND OpenRouter is having issues, Haiku catches everything.

At 10x the cost of Gemini, you don't want this firing constantly. That's where monitoring comes in.

Tier 4: Claude Sonnet/Opus (The Heavy Artillery)

Models: claude-sonnet-4-5, claude-opus-4-5
Cost: $3-15/M input, $15-75/M output
When used: Complex reasoning, code architecture, explicit requests

Reserved for tasks that actually need them:

  • Multi-file code refactoring
  • System architecture decisions
  • Complex debugging requiring deep reasoning
  • When the user explicitly asks for the "big brain"

Model Alias System

{
  "models": {
    "anthropic/claude-sonnet-4-5": { "alias": "sonnet" },
    "anthropic/claude-opus-4-5": { "alias": "opus" },
    "google/gemini-2.0-flash": { "alias": "gemini-flash" }
  }
}

User can type /use opus to explicitly switch, but defaults stay cheap.

Monitoring: Catching Runaway Costs

The tier system only works if you monitor it. I run a Windows Task Scheduler job every 30 minutes that:

  1. Parses logs for model usage
  2. Counts by model - How many Haiku? Sonnet? Opus?
  3. Checks thresholds - Opus should be 0 for background tasks
  4. Sends Telegram alerts if something's wrong

Alert Thresholds

$maxOpusPerHour = 1      # Opus should NEVER be used by cron jobs
$maxHaikuPerHour = 5     # Haiku means Gemini is failing
$maxSonnetPerHour = 20   # Runaway conversation detection
$max429PerHour = 10      # Rate limit problems

What Triggers Alerts

Condition Alert
Opus used at all 🚨 OPUS USED - Check fallback config!
Haiku > 5/hour ⚠️ Haiku fallback triggered - Gemini may be failing
402 errors 🚨 PAYMENT REQUIRED - Credits depleted!
429 > 10/hour ⚠️ Rate limit errors - API quota issues

If I wake up to a Telegram message, something's wrong. No message = system healthy.

Real-World Cost Comparison

Before (Everything on Claude)

Task Model Daily Calls Cost/Day
Cron jobs Sonnet 50 $2.25
Heartbeats Haiku 12 $0.10
User chat Sonnet 200 $9.00
Background Sonnet 100 $4.50
TOTAL $15.85/day

After (Tiered System)

Task Model Daily Calls Cost/Day
Cron jobs Gemini Flash 50 $0.02
Heartbeats Gemini Flash 12 $0.005
User chat Gemini Flash 180 $0.07
User chat (complex) Sonnet 20 $0.90
Background Gemini Flash 100 $0.04
TOTAL $1.04/day
Savings: 93%
From $15.85/day to $1.04/day
Monthly: $475 → $31

Implementation Tips

1. Start with logging before switching

Track what models are being used and why before changing anything. You might find 80% of your expensive calls are for simple tasks.

2. Use model aliases

Make it easy to switch: gemini-flash, sonnet, opus. Users shouldn't memorize model IDs.

3. Set up alerting immediately

The moment you deploy a fallback system, monitor it. A misconfigured fallback chain can burn through credits overnight.

4. Test your fallbacks

Deliberately rate-limit yourself and verify the chain works:

# Simulate Gemini failure
curl -X POST your-api -H "X-Force-Fallback: true"

5. Consider task-specific routing

Some tasks should always use a specific tier:

  • Summarization → Always Tier 1
  • Code review → Always Tier 4
  • Health checks → Always Tier 1

The Configuration That Runs My System

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "google/gemini-2.0-flash",
        "fallbacks": [
          "openrouter/google/gemini-2.5-flash-lite",
          "openrouter/deepseek/deepseek-chat-v3-0324",
          "anthropic/claude-haiku-4"
        ]
      },
      "heartbeat": {
        "model": "gemini-flash",
        "every": "2h"
      }
    }
  }
}

Final Thoughts

The "just use GPT-4/Claude for everything" approach is dead. Modern AI infrastructure requires the same thinking we apply to any distributed system:

  • Use the cheapest resource that works
  • Have fallbacks for reliability
  • Monitor everything
  • Reserve expensive resources for when they're needed
My bot now handles thousands of daily interactions for about a dollar. The expensive models are still there when I need them - but they're not wasting money on "what's the weather?" queries.

Build your tiers. Set your fallbacks. Sleep peacefully while your AI runs on pennies.


Questions? Running an always-on AI assistant?
Drop a comment. I'd love to hear how you're handling costs.

Author: PuebloKC
Running OpenClaw AI automation system
February 2026

Wednesday, October 22, 2025

Google Drive and OneDrive ARE NOT backups

Google Drive and OneDrive are sync tools, not backups.
If ransomware hits or files get deleted, they sync that too—straight into the void.

☁️ Backup Rule 101: Always keep at least one offline or immutable copy of your data.
💾 True backup = a separate system that keeps previous versions safe from sync errors or hacks.

#TechTips #BackupStrategy #CyberSecurity #DataProtection

Monday, September 15, 2025

My digital brain developed with AI

Trying to improve my awful systems and memory with automation and Ai developed solutions.

Brainscan eats data from multiple apps and sources and then displays it and allows it marked for bill etc.

BrainDump is a simple web app for quickly entering a note or idea about anything. This gets saved and categorized for later use. 

Worklog is a simple way to enter work for clients, which feeds into Brainscan as well. 

The final piece is the call the nerd. Com final front-end finishing then we will funnel that into the same system along with alert

Now I have a server that processes every random thought or idea I have for later use, auto creates tasks and events, auto creates invoices and billing info.

All hosted on a $35 virtual cloud server

Tuesday, July 08, 2025

online security gut check. secure yourself now.



🔐 Have You Enabled 2FA Everywhere? Changed Your Password This Decade?

Let’s start with a gut check.

Do you reuse the same password across multiple sites?

Is your email password the same one you used in 2013?

Have you enabled two-factor authentication (2FA) on your most important accounts?

Or... are you still getting text codes (yikes) instead of using an authenticator app?


If any of the above made you uncomfortable — good. That’s the point.


---

🧠 Why This Still Matters

Every week we clean up the mess after someone gets locked out of an account, or worse, loses access due to a breach. And 99% of the time? It could’ve been prevented with a simple change:

✅ Stronger, unique passwords
✅ Two-Factor Authentication (2FA)

Still think you’re safe because “nobody would hack me”? Newsflash: they don’t target you. They target everyone, and your password is probably already floating around out there. Just check https://haveibeenpwned.com if you don’t believe me.


---

🛡️ Do This Now:

Enable 2FA (Everywhere)

Start with: 
1. Email
2. Bank
3. cloud storage
4. Social media 
5. Any sites with your personal data or billing info

Use an authenticator app (like Authy, Microsoft Authenticator, or Google Authenticator). Not SMS. Text-based 2FA can be hijacked.

2. Use a Password Manager

Bitwarden, 1Password, or even the built-in tools from Apple or Google are far better than your sticky note or Excel sheet.

3. Change Old Passwords

If your password is:

Over 5 years old

Shared across more than one site

Or contains your pet's name and birth year...


Change. It. Now.

Use 12+ characters, mix it up, and stop using "!" at the end to feel secure. Hackers are wise to that game.


---

⚙️ Bonus Tip: Turn on Login Alerts

Most sites let you enable notifications when someone logs in from a new device. Do that. It’s like a smoke alarm for your accounts.

Final tip: listen to your IT people, ensure you have working backups, and someone monitoring the security of your devices, servers, and other online assets.

And update your end of life networking devices and computers!


---

🔚 Final Thoughts

Cybersecurity isn’t just for IT nerds. It’s for everyone. The smallest effort (like enabling 2FA) goes a long way in preventing a total digital meltdown.

And if it’s too much? That’s what we’re here for.

!Need help securing your accounts or deploying 2FA for your business? Contact us mypueblopc.com

Friday, June 20, 2025

Ransomware: It’s Like an Escape Room… But With Your Business.

There's been a massive spike in real-world malware attacks lately on average businesses—not just attempts, but full-blown breaches where threat actors gain complete remote control of systems.

These attacks almost always end in ransomware, data theft, or total system compromise.

Where I used to see occasional phishing or intrusion attempts, I’m now seeing successful attacks almost daily. Just this week, one careless click on a fake file by a single employee could have given attackers full access to an entire company's network, backups, and data—completely wiping everything out.

If proper security tools and procedures hadn’t been in place, that business would have been destroyed.

This stuff is as serious—and terrifying—as it gets. And yet, far too many business owners and individuals still believe it “won’t happen to them.”

Wednesday, June 18, 2025

Explosive home remedies


When nasal spray just won’t cut it, light the fuse of freedom.

EyeBong

Eyebong™: The first ophthalmic cannabinoid serum with mind-expanding vision technology. See the truth. Or go blind trying.

Monday, May 19, 2025

No Fires Today (Because I Put Them Out at 4AM)


Today’s biggest crisis? Logging into Google.
(Yep. That was it. The login screen. The pinnacle of chaos.)

So I’m officially calling it a success.
Never mind the fact that I spent the weekend elbows-deep in outdated machines, broken updates, stubborn printers, and a firewall that decided it had free will.

Everything looked smooth because I spent hours making sure it was.
No one notices when things don’t go wrong—
because I already fought all the gremlins before sunrise.

You're welcome.