How We Evaluate AI Tools

Our methodology is public, versioned, and applied without exception. Every score can be traced back to specific tests and verifiable data points.

Methodology v1.0 — February 2026 Download as PDF ↓

Authored and applied by Hamid Ali — Background in financial audit and enterprise systems

The 10-Category Scoring Framework

Every tool is scored 0–100 across weighted categories that matter most to buyers. The weights reflect what actually impacts business decisions.

Category	Weight	What We Measure
Core AI Performance	25%	Accuracy, reasoning quality, task completion rate, hallucination frequency
Data Privacy & Security	20%	Data handling policies, encryption, retention, third-party sharing
Transparency & Explainability	15%	Model disclosure, training data clarity, decision transparency
Reliability & Uptime	10%	API reliability, documented SLA, historical uptime, error handling
Compliance & Regulatory Fit	10%	GDPR, HIPAA, SOC2, EU AI Act readiness, audit logging
Pricing Fairness	8%	Price-to-value ratio, hidden fees, contract lock-in, free tier honesty
Integration & Usability	4%	API quality, documentation completeness, onboarding experience
Human Override Capability	4%	Can humans override AI decisions? Clear escalation paths?
Vendor Accountability	1%	Bug reporting responsiveness, public incident history, SLA enforcement
Bias & Fairness	3%	Documented bias testing, demographic fairness, known failure modes

The 5-Step Review Process

Every review follows these five steps in order. No steps are skipped. No exceptions.

Independent Testing (Minimum 2 Weeks)

We use the tool for its stated purpose across realistic scenarios. We pay for every tool we test at the tier being reviewed. Testing occurs at different times to capture reliability variance.

Documentation Deep-Dive

Privacy policy reviewed line by line. Terms of service flagged for unusual clauses. API documentation assessed. Security certifications verified through primary sources.

Developer Interview

Questions about model architecture, data handling, breach protocols, and roadmap. If a vendor declines, this is noted publicly. Interview summaries are published.

Independent Researcher Cross-Check

Draft reviews are verified by independent researchers. Their inputs are published verbatim. Researchers are never anonymous—full name and affiliation required.

Publication & Score Assignment

Final score calculated. Rationale published for every category. Company notified—they can submit a response but cannot request edits to scores.

Vendor Response Policy

Companies have 30 days from publication to submit a formal written response. Responses are published in full, unedited, as a dedicated section within the review.

Important: This is not an appeal. Scores do not change based on vendor objection. Scores change only when verifiable new information is provided or when the product materially changes.

Methodology Versioning

Every review is tagged with the methodology version under which it was conducted. When the methodology is updated:

All affected reviews are flagged for re-evaluation
Each version update is logged with what changed and why
Reviews conducted under old versions are clearly labeled
Major version changes trigger mandatory re-review

What We Will Never Do

Accept payment to improve, change, or remove a score
Offer priority review scheduling for paying companies
Publish anonymous reviews or researcher contributions
Quietly edit reviews—all changes are logged publicly
Review tools where we have any financial interest
Review a tool in which the reviewer holds any equity, advisory role, or commercial relationship

Methodology Changelog

v1.0 — February 2026 — Initial framework published.