What Prompt Engineering Is

Prompt engineering is essentially:

Designing input structure (instructions, context, examples, and output constraints) to improve model output quality, stability, and usability.

At an early stage, this was mainly a “single-call optimization” problem:

How to reduce model drift for the same question
How to force structured output for programmatic integration
How to make the model focus on the most relevant information under limited context

One-line view:

Prompt engineering = translating natural-language requirements into stable, executable model input contracts

What Early Prompt Engineering Tried to Solve

In early LLM usage, the main pain points were direct:

Unstable outputs

Same input, varying output quality across runs

Inconsistent instruction following

Missing constraints, skipped steps, or task boundary drift

Uncontrolled output format

Hard to reliably produce JSON/table/structured fields

Hallucination and fabrication

Models tend to fill gaps with invented facts

High engineering integration cost

Hard to plug responses into automated pipelines (parse/store/invoke)

The real value of prompt engineering was turning “probabilistic conversation behavior” into “repeatable invocation behavior.”

Typical Methods in Prompt Engineering

1. Instruction Clarification

Break tasks into explicit actions and avoid vague intent.

You are a backend code review assistant.
Goal: identify concurrency safety issues.
Scope: only check src/service/*.java.
Output: return a Markdown table with columns risk_level/file_path/fix_suggestion.

2. Structured Constraints

Define a fixed output schema to reduce “looks good but unusable” responses.

{
  "risk_level": "high|medium|low",
  "file": "string",
  "issue": "string",
  "fix": "string"
}

3. Few-shot Examples

Provide 1-3 high-quality examples to improve style consistency and task alignment.

4. Role and Boundary Control

State what the model can and cannot do, especially no guessing.

If evidence is insufficient, return "insufficient information" and do not fabricate.

5. Iterative Tuning

Treat prompts like code: version, test, and refine.

How to Use It in Real Development (Executable Workflow)

Step 0: Define the Task Interface First

Define clearly:

What the input is
Who consumes the output (human/program)
What qualifies as acceptable output

This is essentially defining an API contract for prompts.

Step 1: Use Prompt Templates, Not One-off Writing

Use a stable template:

Role
Goal
Input
Constraints
Output format
Failure handling rules

Example:

[Role]
You are a senior frontend reviewer.

[Goal]
Check whether the following PR diff contains accessibility issues.

[Input]
{{DIFF_CONTENT}}

[Constraints]
- Judge only based on the provided diff
- Do not infer unprovided code

[Output Format]
JSON array: [{"severity":"","file":"","issue":"","fix":""}]

[Failure Handling]
If evidence is insufficient, return an empty array and include a reason field.

Step 2: Add Automatic Evaluation to Prompts

Do not rely only on manual reading. At least run:

Format checks: JSON parsable, required fields present
Quality checks: key constraints satisfied (e.g. file and fix must exist)

Step 3: Feed Failure Samples Back into Prompt Design

Convert typical failures into:

New constraints
New examples
New counter-examples

This is the core learning loop in prompt engineering.

Step 4: Split Prompts by Scenario

Do not expect one mega-prompt to cover all tasks. Split by function:

Information extraction prompt
Code review prompt
Planning prompt
Generation prompt

This improves stability and testability.

Limits of Prompt Engineering Alone

Prompt engineering is effective, but has natural boundaries, especially in agent/long-running development:

Limited memory management

Prompt tuning optimizes “how to ask now,” not “how to manage multi-turn state”

Long-context degradation

As history grows, prompt constraints alone cannot solve token/attention dilution

Weak state continuity

After interruption, a single prompt cannot reliably restore full task state

No execution loop by itself

A prompt can say “run tests,” but that does not guarantee tests are executed, logs collected, and state updated

No system-level governance

It cannot alone solve tool orchestration, failure recovery, observability, and quality gates

Why It Evolved into Context Engineering

Once tasks evolved from Q&A to continuous development, the key problems became:

What history to keep
When to compress history
How to retrieve and refill old information
How to hand off state without loss across context windows

That is the scope of context engineering:

Prompt engineering focuses on: how to express tasks
Context engineering focuses on: how to manage task history and state

Why It Further Evolved into Harness Engineering

Even with prompt + context engineering, a larger challenge remains:

How to make agents reliably deliver in real engineering workflows.

That requires system capabilities:

Toolchain orchestration (lint/test/build/deploy)
Quality gates and automatic verification
Failure recovery and retry strategies
Task scheduling and state tracking
Rule accumulation and observability

That is the scope of harness engineering:

Harness engineering = assembling prompt, context, tools, checks, and workflow into a sustainable delivery system

Relationship Among the Three

Dimension	Prompt Engineering	Context Engineering	Harness Engineering
Core question	How to improve single-call output	How to manage multi-turn memory and state	How to make end-to-end delivery stable
Main object	Single input text	History, summaries, retrieval, state	Toolchains, rules, validation, orchestration
Typical artifact	Prompt templates	State snapshots, compression summaries, memory layers	Agent workflows, check loops, runtime policies
Main failure point	Drift in long tasks	Lacks execution/governance	Higher implementation cost, but highest stability

My Practical Conclusion

Prompt engineering is not outdated. It is the foundational layer.

In real development, a practical sequence is:

Stabilize prompt engineering first (stable input/output)
Add context engineering next (handle long-running memory)
Build harness engineering last (close the system loop for stable delivery)

If you jump directly to harness while prompt quality is unstable, complexity rises quickly and failures become harder to debug. If you only do prompt engineering, long-running development remains fragile.

References

OpenAI: Prompt Engineering Guide
OpenAI: Best practices for prompt engineering
Anthropic: Prompt engineering overview
Anthropic: Use XML tags to structure prompts

Agent_Prompt Engineering