
One of the most powerful prompt engineering techniques is using AI to evaluate its own outputs. This creates a quality loop: generate output → evaluate it → improve it → evaluate again.
After generating any output, follow up with:
Review what you just wrote. Identify:
- The 3 weakest points or claims
- Any statements that might be inaccurate
- Where the reasoning could be stronger
- What is missing that should be included Then rewrite the output addressing these issues.
Now review this output as if you were a [specific expert]:
- A sceptical CFO reviewing a business case
- An employment lawyer reviewing an HR policy
- A customer reading a sales proposal What would they find unconvincing, unclear, or missing?
Act as a critic who strongly disagrees with this analysis. What are the 5 strongest counter-arguments? Which claims are most vulnerable to challenge? Where is the evidence weakest?
Score this output on a 1-5 scale for each criterion:
- Accuracy — Are all facts and claims correct?
- Completeness — Does it address all aspects of the original request?
- Clarity — Is it easy to understand for the target audience?
- Actionability — Can the reader act on this immediately?
- Professionalism — Is the tone and format business-appropriate? For each score below 4, explain what would need to change to earn a 5.
Compare these two versions of [document type]. For each criterion below, indicate which version is better and why:
- Clarity of main message
- Strength of supporting evidence
- Appropriateness of tone
- Logical structure
- Actionability of recommendations Overall recommendation: which version to use and what improvements to make.
Write two versions of this [email/proposal/report]: Version A: Formal, data-driven, conservative Version B: Conversational, story-driven, bold Then evaluate both against these criteria: [list] and recommend which to use for [specific audience].
Read this [communication] from the perspective of each audience:
- A CEO (cares about: strategy, ROI, risk)
- A department manager (cares about: implementation, resources, timeline)
- A frontline employee (cares about: job impact, training, support) For each perspective: what works well, what concerns would they have, and what changes would make it more effective for them.
Evaluate evidence quality in AI outputs:
Evaluate communication quality:
Review this output and identify any claims that might be fabricated or inaccurate. For each claim:
- Quote the specific text
- Assess confidence: definitely true / probably true / uncertain / probably false / definitely false
- Explain your reasoning
- Suggest how to verify
List every statistic, study, or source mentioned in this output. For each:
- Quote the reference
- Can this be verified through a real source?
- If uncertain, flag it with [NEEDS VERIFICATION]
The most effective evaluation process:
This loop typically produces publication-quality output in 3-4 rounds.
| Situation | Best Technique |
|---|---|
| First draft of anything | Self-critique + revise |
| Important external document | Full rubric scoring + expert critique |
| Comparing options | A/B testing + audience perspective |
| Research or analysis | Fact-check + source verification |
| Ongoing content production | Quality rubric as standard check |
Yes. Self-critique prompting is one of the most effective prompt engineering techniques. Ask AI to identify weaknesses, score against a rubric, critique from an expert perspective, and suggest improvements. This creates an iterative quality loop that significantly improves output quality.
Use multiple techniques: self-critique (identify weaknesses), quality scoring rubrics (rate 1-5 on accuracy, completeness, clarity), A/B comparison (generate two versions and evaluate), audience testing (review from different perspectives), and hallucination checks (verify facts and sources).
Most business content reaches publication quality in 3-4 rounds: (1) initial generation, (2) self-critique and revision, (3) expert perspective review, (4) final polish. High-stakes documents (board papers, client proposals) may need 5-6 rounds including human expert review.