02

Defining when AIproduct images aretrusted to ship

Company
IKEA Digital Team
Year
2026
Type
AI Product · Trust Boundary Design
Role
Product Intern / AIGC Quality & Scenario Analysis

The work was not about making AI images look impressive. It was about defining the quality boundary where generated assets become safe enough for e-commerce workflows — shipped, reviewed, or rejected with clear rules.

AI images can look realistic and still be unsafe to ship.

I evaluated 93 product replacement tests and 150+ AIGC outputs, turning AI image review from a question of “does it look good?” into reusable quality criteria for shipping decisions.

A generated scene may look polished while quietly changing the product’s color, proportion, material, structure, or usage context. In e-commerce, that is not just a visual flaw. It is a product-truth risk.

93product replacement tests
150+AIGC outputs reviewed
3shipping states defined
Risk ComparisonOriginal SKU → AI Scene → Risk Annotation
OriginalOriginal SKU
GeneratedAI Generated Scene
Review
Color driftShape changedMaterial mismatchReview required
The problem is not that AI images look bad. It is that they can quietly stop representing the same SKU.

The real risk is subtle rewriting of product truth.

The riskiest outputs were often not obviously broken images. They were the images that looked acceptable at first glance while product information had already changed.

So I stopped treating these failures as visual defects and started treating them as product-truth risks: color drift, material mismatch, structural repainting, scale distortion, lighting inconsistency, and unstable edge blending.

Failure Pattern BoardSix failure types worth reviewing
Color DriftProduct color changed
Scale DriftProduct proportion shifted
Structural DistortionGeometry no longer matches SKU
Material MismatchSurface texture was rewritten
Lighting InconsistencyScene lighting breaks realism
Edge Blending IssueProduct boundary looks unstable
Each card isolates one failure type so review language becomes reusable instead of subjective.

I was not comparing model capability. I was comparing production paths.

Prompt + LLM, LoRA + LLM, and Depth / 3D + LLM were not useful to compare as isolated technical stacks. The product question was which path could actually enter a real content production workflow.

What mattered was not technical novelty, but whether each path was better suited for inspiration, controlled replacement, or future high-fidelity production.

AI Path MatrixThree paths, three product decisions
Prompt + LLM
Best For

Inspiration / scene exploration

Main Risk

Product repainting, hallucination, scale drift

Product DecisionReference Only
LoRA + LLM
Best For

Controlled replacement within similar categories

Main Risk

Depends on source quality, samples, and scene similarity

Product DecisionHuman Review
Depth / 3D + LLM
Best For

High-fidelity product image production

Main Risk

Higher cost, asset dependency, and process complexity

Product DecisionFuture Investment
This matrix expresses product trade-offs, not technical evangelism.

Turning “it feels wrong” into reusable quality criteria.

The hardest part of image review is that everyone can say an image feels off, but it is much harder to explain why it cannot be used.

I broke subjective review into reusable dimensions: color, proportion, structure, material, lighting, edge blending, and e-commerce usability.

Quality ChecklistFrom taste-based review to review language
AI Generated ImageReview Surface
Color FidelityPass / Risk / Fail
Size AccuracyPass / Risk / Fail
Structure ConsistencyPass / Risk / Fail
Material MatchPass / Risk / Fail
Lighting NaturalnessPass / Risk / Fail
Edge BlendingPass / Risk / Fail
E-commerce UsabilityPass / Risk / Fail
The point was not to build a scoring system, but to establish a stable review vocabulary.

The key judgment: Ship, Human Review, or Reference Only.

I grouped generated outputs into three states: ready to ship, requiring human review, or reference-only. This was the product boundary that turned AI from a generation tool into a controlled workflow.

Trust BoundaryThree routing decisions for every generated output
Ship
Stable colorStable structureAccurate proportion
Human Review
Minor color riskEdge uncertaintyScene relevance risk
Reference Only
Product repaintedStructure changedMisleading SKU
This is the strongest product-principle graphic on the page.

An AI PM’s value is not believing what the model can do. It is defining when the model should not be trusted.

For AIGC content production, model capability only becomes product value when it enters a workflow that is reviewable, explainable, and accountable.

The core of the workflow is not reducing how many images humans have to inspect. It is helping the team know which image can be shipped, which needs review, and which must never enter production.

AI capability becomes product capability only after its trust boundary is explicitly designed.

Next Project

AIGC Visual Production