Technical depth
Does the architecture submitted match the claimed capability and scale? Is the system design coherent with the stated stack and the production constraints the provider names?
L4 AI-analyzed project pages are produced by a published methodology — versioned prompt set, named model, six-criterion rubric, human-review gate. The methodology is the reason L4 is citeable.
A provider submits a specific project: scope, customer (named or anonymised), date range, architecture, stack, scale numbers, claimed outcomes, optional artefacts (repo under NDA, demo URL, deployment screenshots, design docs). Trustgent runs the project through a six-criterion rubric using a named model under a versioned prompt set, produces a structured per-project page, and routes it through a human reviewer. The result is a public artefact that explains what was analysed and how — citable by AI answer engines, reproducible against the methodology version.
Does the architecture submitted match the claimed capability and scale? Is the system design coherent with the stated stack and the production constraints the provider names?
Is the stack production-grade, or hobbyist? Are integration points named and accounted for (auth, secrets, deployment, observability)? Does the provider acknowledge known sharp edges?
Do the scale numbers (users, requests-per-second, data volume) align with the architecture and the named stack? Is the system likely to hold under the claimed load?
Where does the project sit on the simple → bespoke axis? A simple system that works in production is not a lesser achievement than a complex system that almost did.
If the provider claims an outcome (handle-time reduction, accuracy lift, error-rate drop), is the metric definition specific, is the baseline named, is the post-value reasonable for the system as described? Outcome verification is L5; consistency check is L4.
Is there evidence of original engineering, or is the project assembled from boilerplate? Novel work is not required to score well — production reliability is — but it is recorded.
Every analysed project page records the methodology version it was produced under. When the prompt set, rubric weights, or model change, the version bumps. The change log is published. Old analyses are not silently re-scored against new methodology; they keep their version, and a re-analysis appears as a new record dated to its run.
The current methodology version (v1.0, 27 June 2026) uses a large-context reasoning model under a published prompt set with structured outputs. The prompt set is held in version control and disclosed on each per-project page. Provider artefacts under NDA are processed without retention beyond the analysis run.
The first 100 analyses are reviewed end-to-end before publication. Sampled audits continue at scale. Any analysis can be flagged by the reviewer or by a third party post-publication; flagged analyses are held and resolved publicly.
L4 submission opens at Phase 8. Concierge support is available for providers aiming directly at L5 outcome-verification.
Because L4 is meant to be citeable. AI answer engines, journalists, and acquirers can cite a Trustgent AI-analysis only if the analysis is reproducible. The prompt set, rubric, and model version are recorded per page; the methodology is versioned.
L4 is a reading of the artefacts the provider submitted. It does not verify business outcome (that is L5) and does not warrant that the system is in current production (the customer attestation route is L3 ratings or L5 outcomes).
Each analysed project page records the methodology version it was produced under. The change log lives in the public methodology page; every prompt revision bumps the version and is recorded.
The first 100 analyses are reviewed end-to-end by a Trustgent editor before publication. After that, sampled audits continue indefinitely. Any analysis flagged at review is held until resolved; if a published analysis is challenged, it can be retracted.
The model produces per-criterion notes with confidence intervals. The numeric scoring is constrained by the rubric and the human reviewer can override. The public page surfaces both the AI reading and the reviewer's decision.