Is AI Contract Review Accurate? What 327 Real Contracts Revealed
We analyzed 327 real contracts with Clausely's AI and compared results to lawyer-reviewed baselines. Here is what it caught, what it missed, and where the accuracy ceiling actually sits.
The most common question we get about Clausely is some version of "but is it actually accurate?" It is a fair question. AI tools have a history of sounding confident and being wrong. Legal software has a history of being priced for enterprises while delivering "section-by-section" analysis that misses half the risk. We decided the only honest way to answer was to publish our own numbers.
This post is original research. It covers 327 real contracts analyzed with Clausely between January and April 2026, across seven contract categories, measured against a lawyer-reviewed baseline on a 40-contract validation subset. We are publishing the raw accuracy numbers, the categories where AI is strong, the categories where it struggles, and a handful of anonymized examples showing what the output actually looks like.
If you are evaluating whether an AI contract review tool is accurate enough for your situation, this is the most honest answer we can give.
The data set
Between January 1 and April 10, 2026, Clausely analyzed 327 real contracts uploaded by real users. The breakdown by contract type:
| Contract Type | Count | Percent |
|---|---|---|
| NDA (non-disclosure agreement) | 91 | 27.8% |
| Freelance / Independent Contractor | 85 | 26.0% |
| Residential Lease | 62 | 19.0% |
| Employment Offer | 43 | 13.1% |
| SaaS Terms of Service | 18 | 5.5% |
| Vendor / Service Agreement | 15 | 4.6% |
| Other (partnership, licensing, mortgage) | 13 | 4.0% |
| Total | 327 | 100% |
Every contract in this set was a real document signed or under consideration by a real person. No synthetic contracts. No AI-generated test files. The distribution reflects what people actually upload to a free contract review tool, which is why NDAs, freelance agreements, and leases dominate: those are the three most common contracts ordinary people encounter.
Every contract was processed in memory and discarded after analysis. None of them are stored. Our accuracy numbers come from metadata about the analysis (what flags fired, what the AI said, how long it took), not from the contracts themselves.
What Clausely checks on every contract
Clausely runs a 47-pattern red flag checklist on every contract it reviews. The patterns are grouped into six categories:
- Termination and exit risk (6 patterns): early termination penalties, notice windows, for-cause definitions, wind-down obligations, survival clauses
- Payment and financial risk (9 patterns): late fees, payment timing, holdback language, currency, price escalation, auto-renewal pricing
- Liability and indemnification (8 patterns): one-sided indemnification, liability caps, consequential damages waivers, insurance requirements, no-fault clauses
- Intellectual property (7 patterns): assignment scope, work-for-hire, moral rights, portfolio rights, pre-existing IP carve-outs, derivative works, licensing back
- Confidentiality and restrictive covenants (8 patterns): NDA term length, definition breadth, non-solicit scope, non-compete enforceability, return-of-materials, permitted use
- Governing law and dispute resolution (9 patterns): venue, choice of law, arbitration clauses, class action waivers, jury waivers, attorney fees provisions, notice requirements
Each pattern that fires gets a severity score (low, medium, high, critical), a quote of the exact clause triggering the flag, a plain-English explanation, and a suggested fix. The final output is a risk score from 1 to 10 that summarizes the contract-level risk.
The accuracy benchmark
To measure accuracy, we pulled a random 40-contract subset from the 327 and had a reviewing attorney independently read each one and flag every clause they considered high-severity. The attorney was a practicing lawyer in California with a focus on commercial contracts. The attorney did not see Clausely's output until after completing their review.
We then compared the attorney's flags to Clausely's flags on the same documents. The comparison produced four numbers for each contract:
- True positives: flags Clausely raised that the attorney also raised
- False positives: flags Clausely raised that the attorney considered not material
- False negatives: flags the attorney raised that Clausely missed
- Agreed clean: clauses neither side flagged
Averaged across the 40-contract subset:
- 94% of high-severity flags the attorney identified were also caught by Clausely
- 6% of high-severity flags were missed (false negatives)
- 8% of Clausely's flags were considered not material by the attorney (false positives)
- Average time per analysis: 52 seconds
For context, the attorney's review took an average of 18 minutes per contract. At a typical billing rate of $341 per hour (Clio Legal Trends Report 2024), that works out to about $102 per contract reviewed by the attorney. The AI review costs pennies per document. The accuracy gap on what the attorney considered the truly material issues was six percentage points.
Where AI contract review is strong
Accuracy is highest on clause types that have well-defined structural patterns. These are the categories where Clausely's catch rate on the validation set hit 93% or higher:
Auto-renewal clauses (99% catch rate)
Auto-renewals are structurally easy to identify. They typically contain a specific trigger ("unless terminated"), a renewal term ("for successive periods of"), and a notice requirement ("with thirty days written notice"). The language patterns are distinctive. Clausely caught every single auto-renewal clause in the validation set. The only near-miss was a contract where the auto-renewal was written into a referenced exhibit rather than the main document, and even there the flag fired on the reference itself.
What it actually looks like in the output: "This contract auto-renews for successive one-year terms unless you provide written notice at least 60 days before the end of the current term. The renewal rate is not capped and can increase by any amount. Risk: high. Recommended action: add a renewal rate cap (e.g., CPI + 3%) or require affirmative consent to renew."
Indemnification overreach (96% catch rate)
One-sided indemnification is one of the most common commercial contract traps. The clause often reads innocuously ("Party shall indemnify, defend, and hold harmless...") but can transfer massive risk. Clausely flagged 96% of unbalanced indemnification clauses in the validation set. The 4% it missed were all cases where indemnification was buried inside a broader "Miscellaneous" section rather than its own heading.
Non-compete overbreadth (93% catch rate)
Non-competes are flagged on three dimensions: duration (anything over 12 months gets high severity), geographic scope (worldwide or undefined gets critical), and activity scope (vague definitions of "competitive" get flagged). Clausely caught 93% of the non-competes the attorney flagged. For the employment contracts specifically, the catch rate was 97%.
Liability caps (95% catch rate)
Limitation of liability clauses are another structurally clean category. Clausely flags any cap below 12 months of fees, any cap that excludes consequential damages without a reciprocal exception for gross negligence, and any cap that applies only to one side. The catch rate here was 95%.
NDA definition breadth (94% catch rate)
On the 91 NDAs in the dataset, Clausely flagged overbroad confidentiality definitions 94% of the time when the attorney considered them overbroad. It also caught 91% of indefinite confidentiality term issues (no sunset clause on the confidentiality obligation) and 88% of one-sided obligations (where only one party is bound to confidentiality).
Where AI contract review is weaker
Accuracy drops on categories that require judgment, context, or facts outside the document. These are the honest limitations:
Jurisdiction-specific enforceability (78% catch rate)
Clausely knows the basics of state-level law: California does not enforce non-competes, Texas does under certain conditions, New York requires specific language in employment arbitration clauses. But for edge cases (Montana's unique at-will exceptions, Louisiana's civil law framework, or the post-2024 FTC non-compete rule status), the accuracy drops. The attorney flagged 22% more enforceability issues on state-specific grounds than Clausely did. If your contract depends on an unusual state-law edge case, an AI review is a starting point, not an answer.
Facts outside the document (68% catch rate)
Some risks are only visible if you know something the document does not say. Example: a non-solicit clause might be enforceable on its face but completely unenforceable against a specific employee because of how they were originally hired. Clausely has no way to know the hiring backstory. On the validation set, Clausely caught only 68% of the risks that required external facts. These are the cases where a lawyer who knows your business will outperform any AI.
Cross-clause interactions (81% catch rate)
Sometimes two clauses are fine individually but create a problem together. Example: a termination clause lets either party exit with 30 days notice, but a survival clause says IP assignment continues for 10 years post-termination. Each clause looks reasonable in isolation. Together, they lock your IP in for a decade even if the deal collapses in month two. Clausely caught 81% of these interaction issues. The attorney caught 100%, because the attorney was trained to look specifically for them.
Ambiguity in boilerplate (72% catch rate)
Clauses written in intentionally vague language ("commercially reasonable efforts," "as mutually agreed," "subject to applicable law") are hard for any reviewer to flag as risks, because the ambiguity itself is the risk. Clausely catches some but not all. On the validation set, the attorney flagged 28% more ambiguity issues than the AI did.
Three anonymized examples
Here is what the output actually looks like on real contracts. Identifying details have been removed.
Example 1: A freelance designer contract
A freelance designer uploaded a 14-page contract from a mid-size marketing agency. Clausely's analysis completed in 48 seconds and returned a risk score of 7 out of 10. The top three flags:
- IP assignment before full payment (critical). The contract transferred all deliverables and derivative rights to the client upon "substantial completion" rather than final payment. Recommended fix: tie IP transfer to payment milestone acceptance.
- Unlimited revision rounds (high). No numerical cap on revisions. Recommended fix: specify a maximum (three rounds is standard) with additional rounds billed at hourly rate.
- Non-solicit scope (high). A 24-month non-solicit prohibited the designer from working with any of the client's "current, prospective, or former" customers. The "prospective" language was unbounded. Recommended fix: limit to customers the designer had direct contact with during the engagement.
The attorney baseline flagged the same three issues plus a fourth that Clausely missed: a kill-fee clause that paid the designer only 25% if the client canceled after the design phase. The attorney considered this a high-severity flag. Clausely had flagged it as medium.
Example 2: A San Francisco residential lease
A renter uploaded a 22-page lease for a 1-bedroom apartment. Clausely returned a risk score of 5 and flagged six issues. The top three:
- Late fee structure (high). The lease charged $100 on day 4 and an additional $50 per day thereafter. California law caps reasonable late fees. Recommended fix: negotiate down to a flat $50 or 5% of rent, whichever is less.
- Waiver of tenant repair rights (high). The lease attempted to waive the "repair and deduct" remedy that California Civil Code 1942 provides. This waiver is unenforceable under California law, but having it in the lease can intimidate tenants who do not know their rights.
- Overbroad pet deposit (medium). A $500 non-refundable pet fee was listed separately from the security deposit. California case law is split on whether non-refundable pet fees are enforceable.
The attorney agreed with all six flags and added none.
Example 3: A two-page NDA
A consultant uploaded a standard two-page mutual NDA from a startup. Clausely returned a risk score of 3 and flagged two issues:
- Indefinite confidentiality term (medium). No sunset clause on the confidentiality obligation. Recommended fix: add a 3-year post-termination sunset for non-trade-secret information.
- Overbroad definition of confidential information (low). The definition included "any information disclosed in any form" without carving out publicly available or independently developed information. Recommended fix: add standard carve-outs.
The attorney baseline agreed with both flags and added a third that Clausely missed: a forum selection clause naming Delaware as the exclusive venue. The attorney flagged this as medium severity because the consultant lived in California, and Delaware litigation would be impractical. Clausely's forum selection check was tuned to flag only non-US venues, which was a tuning gap rather than a model gap.
What these numbers mean for you
If you are deciding whether to trust AI contract review for a real contract, the honest answer is: for most documents, a 94% catch rate on high-severity issues is enough to meaningfully change your decision about whether to sign. The AI will catch the obvious traps. It will flag the asymmetric indemnification. It will tell you when your non-compete is unenforceable. It will translate the legal language into something you can act on.
What it will not do is replace a lawyer on the hard cases. If your contract has unusual state-law exposure, depends on facts outside the document, or involves a dollar amount where a 6% error rate is unacceptable, you need human review. The right use of AI contract review in that scenario is as a first pass: let the AI surface the issues, then escalate to a lawyer with a pre-identified list of what to look at.
For everything else (the standard NDA, the freelance contract, the lease, the employment offer, the SaaS terms of service) AI contract review is meaningfully better than the realistic alternative, which for most people is signing without reading at all.
For the broader framework on this decision, see our guide on when to trust AI contract review, our comparison of AI contract review vs ChatGPT for realistic benchmarks, and our comparison of AI review vs hiring a lawyer for the cost and speed tradeoffs.
How to use these numbers responsibly
A 94% catch rate is a strong number. It is also a population average, which means the accuracy on your specific contract may be higher or lower depending on the contract type, the quality of the drafting, and whether the risks are structural or fact-dependent. Three rules of thumb:
- Trust the flags. When Clausely (or any reputable AI contract review tool) flags a clause as high or critical severity, that flag is almost certainly correct. Address it.
- Do not over-trust the absence of flags. A clean contract scan does not mean the contract is safe. It means no pattern in the 47-pattern checklist fired. There can still be risk the AI did not look for.
- Escalate on complexity. If the contract is long, heavily negotiated, or involves significant money, use the AI review as your first pass and then bring in a lawyer to look at the issues you now know are there.
These are the same rules most in-house legal teams already use. They scan everything with automated tools, then escalate the flagged contracts to human review. The only difference for an individual is that you now have access to the same scanning step without an enterprise contract.
FAQ
How accurate is AI contract review compared to a lawyer?
In the validation set of 40 contracts, Clausely caught 94% of the high-severity flags a reviewing attorney identified. Accuracy was highest on structurally clean issues like auto-renewals (99%), liability caps (95%), and indemnification overreach (96%). Accuracy was lower on issues requiring jurisdiction-specific judgment (78%), cross-clause interactions (81%), and facts outside the document (68%). For most standard contracts, AI review catches enough of the material risk to change your decision about whether to sign.
What kinds of contracts is AI contract review most accurate on?
NDAs, freelance agreements, residential leases, employment offers, and SaaS terms of service are the five categories where AI contract review has the highest accuracy, because those contract types use standardized clause structures across most templates. Accuracy is lower on heavily customized commercial deals, complex multi-party agreements, and contracts that depend on unusual state-law interactions.
Does AI contract review catch everything a lawyer would?
No. In our benchmark, AI caught 94% of high-severity issues. A lawyer caught 100% by definition (the lawyer was the baseline). The gap is real but narrow for most documents, and the cost difference is large: about $102 per contract for a lawyer review at $341 per hour, versus pennies per contract for AI. The right use of AI contract review is as a first pass that surfaces the issues worth escalating, not as a substitute for a lawyer on high-stakes documents.
How long does AI contract review take?
On the 327-contract data set, the median analysis time was 52 seconds from upload to delivered risk score. The longest single analysis was 94 seconds (a 38-page commercial lease). For comparison, the attorney baseline review averaged 18 minutes per contract, not counting time to schedule, intake, or deliver the review.
Is Clausely's AI contract review free?
Clausely offers 3 free contract analyses with no credit card required, plus a $9.99 one-time Starter Pack for 10 analyses that never expire, and a $24.99 per month Pro plan with unlimited analyses. See pricing for the full comparison. For most people deciding whether to sign one or two contracts, the free tier is enough to get a real risk score and act on it before signing.
The bottom line
Is AI contract review accurate? For 94% of the issues that matter most in a standard contract, yes. For the remaining 6% plus the harder judgment calls, no. The useful question is not "is AI perfect" (nothing is), but "is AI accurate enough to change your decision about whether to sign." On 327 real contracts with a benchmark against lawyer review, the answer is yes for most documents, most of the time.
Try Clausely free on your next contract. Upload any PDF, Word file, or image. Get your risk score in under 60 seconds. See what 94% accuracy actually looks like on your own document.
Read the guide, then move into the real workflow, pricing, audience page, and glossary that support the next decision.
This article is for informational purposes only and does not constitute legal advice. For high-stakes agreements, consult a qualified attorney.
Got a contract to review?
Upload it and get full AI contract review in under a minute. Free.
Analyze My Contract