Agent_Prompt Engineering

What prompt engineering is, what problems it solves, and why it evolved into context engineering and harness engineering

What Prompt Engineering Is

Prompt engineering is essentially:

Designing input structure (instructions, context, examples, and output constraints) to improve model output quality, stability, and usability.

At an early stage, this was mainly a “single-call optimization” problem:

  • How to reduce model drift for the same question
  • How to force structured output for programmatic integration
  • How to make the model focus on the most relevant information under limited context

One-line view:

Prompt engineering = translating natural-language requirements into stable, executable model input contracts

What Early Prompt Engineering Tried to Solve

In early LLM usage, the main pain points were direct:

  1. Unstable outputs
  • Same input, varying output quality across runs
  1. Inconsistent instruction following
  • Missing constraints, skipped steps, or task boundary drift
  1. Uncontrolled output format
  • Hard to reliably produce JSON/table/structured fields
  1. Hallucination and fabrication
  • Models tend to fill gaps with invented facts
  1. High engineering integration cost
  • Hard to plug responses into automated pipelines (parse/store/invoke)

The real value of prompt engineering was turning “probabilistic conversation behavior” into “repeatable invocation behavior.”

Typical Methods in Prompt Engineering

1. Instruction Clarification

Break tasks into explicit actions and avoid vague intent.

You are a backend code review assistant.
Goal: identify concurrency safety issues.
Scope: only check src/service/*.java.
Output: return a Markdown table with columns risk_level/file_path/fix_suggestion.

2. Structured Constraints

Define a fixed output schema to reduce “looks good but unusable” responses.

{
  "risk_level": "high|medium|low",
  "file": "string",
  "issue": "string",
  "fix": "string"
}

3. Few-shot Examples

Provide 1-3 high-quality examples to improve style consistency and task alignment.

4. Role and Boundary Control

State what the model can and cannot do, especially no guessing.

If evidence is insufficient, return "insufficient information" and do not fabricate.

5. Iterative Tuning

Treat prompts like code: version, test, and refine.

How to Use It in Real Development (Executable Workflow)

Step 0: Define the Task Interface First

Define clearly:

  • What the input is
  • Who consumes the output (human/program)
  • What qualifies as acceptable output

This is essentially defining an API contract for prompts.

Step 1: Use Prompt Templates, Not One-off Writing

Use a stable template:

  • Role
  • Goal
  • Input
  • Constraints
  • Output format
  • Failure handling rules

Example:

[Role]
You are a senior frontend reviewer.

[Goal]
Check whether the following PR diff contains accessibility issues.

[Input]
{{DIFF_CONTENT}}

[Constraints]
- Judge only based on the provided diff
- Do not infer unprovided code

[Output Format]
JSON array: [{"severity":"","file":"","issue":"","fix":""}]

[Failure Handling]
If evidence is insufficient, return an empty array and include a reason field.

Step 2: Add Automatic Evaluation to Prompts

Do not rely only on manual reading. At least run:

  • Format checks: JSON parsable, required fields present
  • Quality checks: key constraints satisfied (e.g. file and fix must exist)

Step 3: Feed Failure Samples Back into Prompt Design

Convert typical failures into:

  • New constraints
  • New examples
  • New counter-examples

This is the core learning loop in prompt engineering.

Step 4: Split Prompts by Scenario

Do not expect one mega-prompt to cover all tasks. Split by function:

  • Information extraction prompt
  • Code review prompt
  • Planning prompt
  • Generation prompt

This improves stability and testability.

Limits of Prompt Engineering Alone

Prompt engineering is effective, but has natural boundaries, especially in agent/long-running development:

  1. Limited memory management
  • Prompt tuning optimizes “how to ask now,” not “how to manage multi-turn state”
  1. Long-context degradation
  • As history grows, prompt constraints alone cannot solve token/attention dilution
  1. Weak state continuity
  • After interruption, a single prompt cannot reliably restore full task state
  1. No execution loop by itself
  • A prompt can say “run tests,” but that does not guarantee tests are executed, logs collected, and state updated
  1. No system-level governance
  • It cannot alone solve tool orchestration, failure recovery, observability, and quality gates

Why It Evolved into Context Engineering

Once tasks evolved from Q&A to continuous development, the key problems became:

  • What history to keep
  • When to compress history
  • How to retrieve and refill old information
  • How to hand off state without loss across context windows

That is the scope of context engineering:

Prompt engineering focuses on: how to express tasks
Context engineering focuses on: how to manage task history and state

Why It Further Evolved into Harness Engineering

Even with prompt + context engineering, a larger challenge remains:

How to make agents reliably deliver in real engineering workflows.

That requires system capabilities:

  • Toolchain orchestration (lint/test/build/deploy)
  • Quality gates and automatic verification
  • Failure recovery and retry strategies
  • Task scheduling and state tracking
  • Rule accumulation and observability

That is the scope of harness engineering:

Harness engineering = assembling prompt, context, tools, checks, and workflow into a sustainable delivery system

Relationship Among the Three

DimensionPrompt EngineeringContext EngineeringHarness Engineering
Core questionHow to improve single-call outputHow to manage multi-turn memory and stateHow to make end-to-end delivery stable
Main objectSingle input textHistory, summaries, retrieval, stateToolchains, rules, validation, orchestration
Typical artifactPrompt templatesState snapshots, compression summaries, memory layersAgent workflows, check loops, runtime policies
Main failure pointDrift in long tasksLacks execution/governanceHigher implementation cost, but highest stability

My Practical Conclusion

Prompt engineering is not outdated. It is the foundational layer.

In real development, a practical sequence is:

  1. Stabilize prompt engineering first (stable input/output)
  2. Add context engineering next (handle long-running memory)
  3. Build harness engineering last (close the system loop for stable delivery)

If you jump directly to harness while prompt quality is unstable, complexity rises quickly and failures become harder to debug. If you only do prompt engineering, long-running development remains fragile.

References