What Prompt Engineering Is
Prompt engineering is essentially:
Designing input structure (instructions, context, examples, and output constraints) to improve model output quality, stability, and usability.
At an early stage, this was mainly a “single-call optimization” problem:
- How to reduce model drift for the same question
- How to force structured output for programmatic integration
- How to make the model focus on the most relevant information under limited context
One-line view:
Prompt engineering = translating natural-language requirements into stable, executable model input contracts
What Early Prompt Engineering Tried to Solve
In early LLM usage, the main pain points were direct:
- Unstable outputs
- Same input, varying output quality across runs
- Inconsistent instruction following
- Missing constraints, skipped steps, or task boundary drift
- Uncontrolled output format
- Hard to reliably produce JSON/table/structured fields
- Hallucination and fabrication
- Models tend to fill gaps with invented facts
- High engineering integration cost
- Hard to plug responses into automated pipelines (parse/store/invoke)
The real value of prompt engineering was turning “probabilistic conversation behavior” into “repeatable invocation behavior.”
Typical Methods in Prompt Engineering
1. Instruction Clarification
Break tasks into explicit actions and avoid vague intent.
You are a backend code review assistant.
Goal: identify concurrency safety issues.
Scope: only check src/service/*.java.
Output: return a Markdown table with columns risk_level/file_path/fix_suggestion.
2. Structured Constraints
Define a fixed output schema to reduce “looks good but unusable” responses.
{
"risk_level": "high|medium|low",
"file": "string",
"issue": "string",
"fix": "string"
}
3. Few-shot Examples
Provide 1-3 high-quality examples to improve style consistency and task alignment.
4. Role and Boundary Control
State what the model can and cannot do, especially no guessing.
If evidence is insufficient, return "insufficient information" and do not fabricate.
5. Iterative Tuning
Treat prompts like code: version, test, and refine.
How to Use It in Real Development (Executable Workflow)
Step 0: Define the Task Interface First
Define clearly:
- What the input is
- Who consumes the output (human/program)
- What qualifies as acceptable output
This is essentially defining an API contract for prompts.
Step 1: Use Prompt Templates, Not One-off Writing
Use a stable template:
- Role
- Goal
- Input
- Constraints
- Output format
- Failure handling rules
Example:
[Role]
You are a senior frontend reviewer.
[Goal]
Check whether the following PR diff contains accessibility issues.
[Input]
{{DIFF_CONTENT}}
[Constraints]
- Judge only based on the provided diff
- Do not infer unprovided code
[Output Format]
JSON array: [{"severity":"","file":"","issue":"","fix":""}]
[Failure Handling]
If evidence is insufficient, return an empty array and include a reason field.
Step 2: Add Automatic Evaluation to Prompts
Do not rely only on manual reading. At least run:
- Format checks: JSON parsable, required fields present
- Quality checks: key constraints satisfied (e.g.
fileandfixmust exist)
Step 3: Feed Failure Samples Back into Prompt Design
Convert typical failures into:
- New constraints
- New examples
- New counter-examples
This is the core learning loop in prompt engineering.
Step 4: Split Prompts by Scenario
Do not expect one mega-prompt to cover all tasks. Split by function:
- Information extraction prompt
- Code review prompt
- Planning prompt
- Generation prompt
This improves stability and testability.
Limits of Prompt Engineering Alone
Prompt engineering is effective, but has natural boundaries, especially in agent/long-running development:
- Limited memory management
- Prompt tuning optimizes “how to ask now,” not “how to manage multi-turn state”
- Long-context degradation
- As history grows, prompt constraints alone cannot solve token/attention dilution
- Weak state continuity
- After interruption, a single prompt cannot reliably restore full task state
- No execution loop by itself
- A prompt can say “run tests,” but that does not guarantee tests are executed, logs collected, and state updated
- No system-level governance
- It cannot alone solve tool orchestration, failure recovery, observability, and quality gates
Why It Evolved into Context Engineering
Once tasks evolved from Q&A to continuous development, the key problems became:
- What history to keep
- When to compress history
- How to retrieve and refill old information
- How to hand off state without loss across context windows
That is the scope of context engineering:
Prompt engineering focuses on: how to express tasks
Context engineering focuses on: how to manage task history and state
Why It Further Evolved into Harness Engineering
Even with prompt + context engineering, a larger challenge remains:
How to make agents reliably deliver in real engineering workflows.
That requires system capabilities:
- Toolchain orchestration (lint/test/build/deploy)
- Quality gates and automatic verification
- Failure recovery and retry strategies
- Task scheduling and state tracking
- Rule accumulation and observability
That is the scope of harness engineering:
Harness engineering = assembling prompt, context, tools, checks, and workflow into a sustainable delivery system
Relationship Among the Three
| Dimension | Prompt Engineering | Context Engineering | Harness Engineering |
|---|---|---|---|
| Core question | How to improve single-call output | How to manage multi-turn memory and state | How to make end-to-end delivery stable |
| Main object | Single input text | History, summaries, retrieval, state | Toolchains, rules, validation, orchestration |
| Typical artifact | Prompt templates | State snapshots, compression summaries, memory layers | Agent workflows, check loops, runtime policies |
| Main failure point | Drift in long tasks | Lacks execution/governance | Higher implementation cost, but highest stability |
My Practical Conclusion
Prompt engineering is not outdated. It is the foundational layer.
In real development, a practical sequence is:
- Stabilize prompt engineering first (stable input/output)
- Add context engineering next (handle long-running memory)
- Build harness engineering last (close the system loop for stable delivery)
If you jump directly to harness while prompt quality is unstable, complexity rises quickly and failures become harder to debug. If you only do prompt engineering, long-running development remains fragile.
References
- OpenAI: Prompt Engineering Guide
- OpenAI: Best practices for prompt engineering
- Anthropic: Prompt engineering overview
- Anthropic: Use XML tags to structure prompts