How Claude Code Rate Limits Work
Claude Code doesn't have an obvious rate limit dashboard. When you hit the limit, you get a "please wait" message and your session pauses. There's no countdown, no percentage, no warning before it happens. This guide explains how the limits actually work so you can plan around them.
Two rate limit windows
Anthropic enforces two separate rate limit windows on Claude Code usage:
- 5-hour burst window — limits how much you can use in a short period. This is the one most people hit. It resets on a rolling basis: usage from 5 hours ago drops off as time passes.
- 7-day sustained window — limits total usage over a week. Harder to hit, but if you're a heavy user on a lower tier, this can cap you before the week is up.
Both windows are rolling, not fixed. There's no "reset at midnight" — instead, your oldest usage continuously falls out of the window as time passes. If you used a lot of tokens 4 hours and 50 minutes ago, those will drop off in 10 minutes.
What counts toward the limit
Rate limits are based on token throughput — the total number of tokens processed in your requests and responses. This includes:
- Input tokens — your messages, file contents Claude reads, tool results
- Output tokens — Claude's responses, code it writes, tool calls it makes
- Cache tokens — tokens read from or written to the prompt cache
A single complex coding task can burn through a large chunk of your limit. For example, if Claude reads several large files, reasons about them, then writes a multi-file refactor — that could be 50,000+ tokens in one turn.
Token usage is also tied to how you manage your context window. A bloated context means more input tokens per turn, which burns through your rate limit faster. Keeping context lean is one of the best ways to stretch your limit.
Limits by plan tier
The exact token limits aren't published by Anthropic, but the relative capacity differs significantly between plans:
| Plan | Price | Relative capacity | Best for |
|---|---|---|---|
| Pro | $20/mo | Base (1x) | Light to moderate daily usage |
| Max 5x | $100/mo | ~5x Pro | Full-time development, multiple projects |
| Max 20x | $200/mo | ~20x Pro | Heavy usage, teams, all-day sessions |
Pro users typically hit the 5-hour window after 2-4 hours of active coding, depending on task complexity. Max 5x users can usually work a full day without issues. Max 20x is designed for users who'd otherwise be on the API.
What happens when you hit the limit
When you reach either limit, Claude Code pauses your session. You'll see a message indicating you've been rate limited. The behavior:
- Your current session stays alive — you don't lose context
- You can't send new messages until capacity frees up
- The wait depends on how much of your window is consumed. It could be minutes or hours
- There's no countdown or ETA shown by Claude Code itself
Why the lack of visibility is a problem
Without knowing how close you are to the limit, you can't make informed decisions. Should you start a large refactoring task at 4pm, or will you get blocked halfway through? Is it worth switching to a simpler task to conserve capacity? Claude Code doesn't give you the information to answer these questions.
How to pace yourself
Until you have tooling to track your usage, here are practical strategies:
Be intentional about what Claude reads
Every file read costs tokens. Instead of letting Claude explore your codebase freely, point it at specific files. Use @file references deliberately rather than asking broad questions that trigger many file reads.
Batch related tasks
Switching between unrelated tasks means Claude needs to re-read context for each one. Grouping related changes together is more token-efficient.
Start fresh sessions for new topics
Long sessions accumulate stale context that Claude scans on every turn. A fresh session with a focused prompt is cheaper than continuing a bloated one.
Use /compact before you're forced to
Running /compact manually when your context is getting large lets you control what gets summarized. Waiting for auto-compaction means Claude decides what to keep. See our context health guide for more on managing your context window.
Monitor your usage
This is exactly why AI Battery exists. It parses Claude Code's local log files to show you real-time utilization percentages, reset countdowns, and time-to-limit predictions for both the 5-hour and 7-day windows.
Track your rate limits in real time
AI Battery shows your 5-hour and 7-day usage, reset countdowns, and time-to-limit predictions — right in your macOS menu bar.
Download AI Battery — FreeHow AI Battery tracks rate limits
AI Battery reads Claude Code's local JSONL conversation logs — the same files stored on your machine at ~/.claude/projects/. It extracts token counts and timestamps from each conversation turn, then calculates:
- Current utilization — what percentage of your 5-hour and 7-day windows you've used
- Reset countdown — when the oldest usage in each window will expire
- Time-to-limit — at your current pace, when you'll hit the limit
- Per-model breakdown — token usage split by model (Opus, Sonnet, Haiku)
Everything runs locally. AI Battery never makes network requests — it just reads log files on your machine. No account required, no telemetry, no data leaves your computer.
Understanding the indicators
AI Battery uses a traffic-light system for both rate limits and context health:
- Green (under 60%) — plenty of headroom. Work freely.
- Orange (60-80%) — approaching the limit. Consider pacing or wrapping up large tasks.
- Red (over 80%) — close to being rate limited. Finish what you're doing and let usage decay.
Common questions
Do rate limits reset at a specific time?
No. Both windows are rolling. Your usage from exactly 5 hours ago (or 7 days ago) continuously drops off. There's no midnight reset or fixed schedule.
Does the model I use matter?
Yes. Different models have different token costs. Opus uses more capacity per token than Sonnet or Haiku. Claude Code automatically selects models, but heavier tasks tend to use more capable (and more expensive) models.
Can I check my limits without AI Battery?
Not easily. Claude Code doesn't expose usage metrics. You could manually parse the JSONL logs, but that's what AI Battery automates for you.
Do cached tokens count toward rate limits?
Cache read tokens count at a reduced rate. Cache write tokens count at full rate. This means subsequent turns in the same conversation are somewhat cheaper than the first turn, because prior context gets cached.
What if I have multiple Claude Code sessions open?
All sessions share the same rate limit pool for your account. Opening multiple sessions doesn't give you more capacity — it just splits your limit across them. AI Battery tracks all sessions and shows your aggregate usage.