Editorial illustration of a long product catalog with a small number of items flagged for attention

Which hundred SKUs to fix first — and what to do with each

When a catalog business tells me "we have about a hundred bad SKUs but we don't know why" — this is the protocol that gets to a defensible action plan in five to seven hours. It catches the analysis traps that make the data lie, and only ships recommendations with the evidence to back them up.

What this could do for your organization

If you run a catalog business with more than a hundred SKUs and the tail is weak — you know it, your merchandising team knows it, your margin numbers hint at it, but nobody can point at the specific hundred items that are dragging the book down and say why — that's the shape of this protocol. Five-to-seven hours of work from "we think we have about a hundred bad SKUs" to a delivered action matrix: every one of those hundred SKUs assigned a concrete recommendation (reprice, optimize listing, bundle, fix catalog data, reposition, or discontinue), with a confidence level and the evidence behind it.

The protocol exists because the alternative is either (a) spending a month on analysis that nobody trusts by the time it lands, or (b) shipping recommendations based on what someone on the team "feels" should go — which is the same reason the tail got weak in the first place. Seven quality gates in the workflow reject any recommendation that isn't defensible: no repricing without verified unit cost, no discontinuation without a two-year margin check, no generic "investigate this" without a concrete next step.

Two things worth knowing from the live engagement. First, sales data lies about cause. The numbers looked like dire SKU-level demand problems — until a fifteen-minute client call revealed a quiet ten-percent across-the-board price increase between the two comparison periods. The protocol has a client-interview phase early, deliberately, so the analysis doesn't chase ghosts. Second, concentration matters: the top hundred SKUs accounted for fifty percent of declining-SKU revenue loss. Fixing the hundred actually moves the needle.

What your team gets back

Within a day of getting your sales data, you get a ranked top-hundred list with severity flags: DEAD (operational issues like stock-outs that your ops team should investigate before anyone does market research), STEEP (price or listing shocks, same initial response), and SLOW (the real research target). This is usually enough to confirm the problem is where you thought it was — or to surface that it's somewhere else.

In the full five-to-seven-hour run, you get the action matrix: each of the top hundred assigned one of six recommendations — reprice, optimize listing, bundle with complementary items, fix catalog data, reposition, or discontinue — with a confidence level graded against a rubric (not just a label) and the evidence underneath each one. Your category team can argue with specific SKUs, but they're arguing with evidence, not "I think this one should stay."

The first engagement is the run: I apply the protocol to your catalog data and return the action matrix. You also get a documented version of the protocol itself — phase structure, clustering rules, quality gates, the adversarial checks — so your merchandising team can either rerun it themselves on next quarter's data, or re-engage me to run it when the next batch drops. The protocol is the durable asset.

How I did it

Most catalog businesses have a tail of product SKUs they suspect are underperforming — slow sellers, weak margins, catalog clutter, "we're carrying dead wood but we don't know which items." The question isn't whether underperformers exist. The question is which ones, why, and what to do about each one — without spending a month on analysis or shipping recommendations with no evidence behind them. I built a protocol that takes a catalog business from "we think we have about a hundred bad SKUs" to a delivered action matrix — every SKU assigned a concrete recommendation with a confidence level — in five to seven hours of work.

The protocol runs in six phases. Pre-engagement research and a CX audit happen before the first client call. A client session validates the data. A ranking phase produces the top-hundred list. A clustering phase groups them into patterns. Deep research runs per category and per hero SKU, with adversarial fracture at each step. A deliverables phase packages the action matrix. Every hand-off has a quality gate.

Phase 00
Pre-engagement
Company + market research, CX audit
adversarial fracture
out: state-of-client memo
Phase 01
Client session
Must-ask interview, CX delta, data request
out: validated definition + data
Phase 01.5
Underperformer ID
Like-for-like windows, ranked top 100 with flags
out: DEAD · STEEP · SLOW
Phase 02
Cluster + triage
5–8 categories, 3–5 hero SKUs, internal-fix flags
out: cluster map + heroes
Phase 03
Research + fracture
Per-category + per-hero deep dives
fracture on every hero
out: findings with sources
Phase 04
Deliverables
Exec summary, hero deep dives, action matrix
out: delivered action matrix
7 quality gates enforced between phases · reject any recommendation that doesn’t meet the rubric

The first step is digging into the sales data to tease out the actual underperformers. That means forcing like-for-like time windows before any ranking is trusted — matching periods on both sides, accounting for seasonality.

Trap · what the platform exports by default
✕ BROKEN
YTD 2026 · 3.5 months£ 412 k
Jan–mid-Apr
Full 2025 · 12 months£ 1.42 M
Jan–Dec
3.5 months compared against 12 months. Growing SKUs look like decliners. Seasonal SKUs look catastrophic. Every subsequent decision is downstream of this error.
Force
like-for-like
Fix · same 3.5-month window, year over year
✓ TRUSTWORTHY
YTD 2026 · Jan–mid-Apr£ 412 k
Jan–mid-Apr 2026
Same period 2025 · Jan–mid-Apr£ 521 k
Jan–mid-Apr 2025
Now the decline is a real signal: −21% units, −14% revenue, £ 109 k lighter on the top line. Ranking from this comparison can be trusted.

With the windows fixed, every declining SKU gets a severity flag. DEAD — went from real volume to zero, usually an operational issue (stock-out, delisting, supplier change) that should be investigated internally before market research touches it. STEEP — dropped more than eighty percent, usually a listing or price shock, same initial response. SLOW — dropped twenty-to-eighty percent, genuine demand or competition shift, the deepest research target. Top hundred, sorted by revenue impact, with a concentration check. On a recent engagement, the top hundred accounted for fifty percent of total declining-SKU revenue loss. Fixing those hundred moves the needle.

Top 100 underperformers · ranked by revenue impact
YoY · like-for-like
#
Flag
YoY loss
Amount
01
DEAD
−£ 6 420
02
STEEP
−£ 5 910
03
SLOW
−£ 5 530
04
SLOW
−£ 5 210
05
STEEP
−£ 5 040
06
SLOW
−£ 4 620
07
DEAD
−£ 4 340
08
SLOW
−£ 4 050
09
STEEP
−£ 3 870
10
SLOW
−£ 3 660
11
SLOW
−£ 3 430
89 more rows
100
SLOW
−£ 510
Flag distribution
DEAD went to zero · 30 SKUs 30%
Stock-out, delisting, supplier change. Investigate internally first.
STEEP > 80% drop · 8 SKUs 8%
Listing or price shock. Same initial response as DEAD.
SLOW 20–80% drop · 62 SKUs 62%
Genuine demand or competition shift. Deepest research target.
Concentration check
50%
of total declining-SKU revenue loss sits in the top 100.
Fixing those hundred moves the needle.
Top 100 · £ 149 k
Full tail · £ 296 k

Naive approach to the research phase: one deep market-research query per SKU. A hundred SKUs, a hundred cold-start queries, twenty hours of AI time, and shallow results because each query has no category context and no peer comparisons. Protocol: cluster the hundred into five-to-eight categories, run one deep query per cluster, pick three-to-five hero SKUs for per-unit deep dives. Same ground covered, a fraction of the time, higher quality — each query has room to actually think.

Naive · one query per SKU
shallow, cold starts
100 SKUs × 1 deep query each. Each query starts with no category context, no peer comparisons.
Effort
100cold queries
20+hours
Protocol · cluster + heroes
room to actually think
Cluster 100 SKUs into 5–8 categories. One deep query per category. Then 3–5 hero-SKU deep dives.
5–8 category clusters · one query each
Category A · 27 SKUs
Category B · 18 SKUs
Category C · 16 SKUs
Category D · 14 SKUs
Category E · 12 SKUs
Category F · 13 SKUs
3–5 hero SKUs · per-unit deep dive + fracture
Hero 01
Hero 02
Hero 03
Hero 04
Effort
~12deep queries
5–7hours

On a recent engagement the top-line numbers looked dire — a twenty-one percent drop in units, fourteen percent in revenue, over a hundred thousand pounds sterling lighter year over year. Analysis could have spent a week building a case for SKU-specific demand problems. Fifteen minutes into the Phase 1 client interview, the real cause surfaced: the client had quietly raised prices ten percent across the board between the two periods. The data was showing price-elasticity, not SKU weakness. The protocol is built around this kind of beat. The data lies about cause until you talk to the client.

What the data showed
YoY · like-for-like
prior current Jan mid-year Dec
Units
−21%
Revenue
−14%
Top-line gap
−£112 k
A week of SKU-level demand analysis looks obvious here. Build the case, present it, ship.
Phase 1 · client interview · minute 14
“Oh yeah, we raised prices ten percent across the board between the two periods. Did I mention that?”
The decline isn’t independent SKU weakness. It’s a price-elasticity response to the across-the-board increase — showing up as a distributed revenue loss across hundreds of items.
Reframe
SKU-specific demand weakness
Price elasticity + whatever’s left
Phase 2 disentangles the elasticity component from genuine per-SKU issues before any recommendation ships.
The data lies about cause until you talk to the client.

The deliverable is an action matrix. Every SKU in the top hundred gets one of six recommendations — reprice, optimize the listing, bundle with complementary items, fix catalog data, reposition, or discontinue — plus a confidence level graded against a rubric, not just a label. Seven quality gates reject any recommendation that fails: no repricing without verified unit cost, no discontinuation without a two-year margin and catalog-completeness check, no generic "investigate" without a concrete next step. What ships is defensible.

Deliverable · action matrix
every SKU in top 100 · one recommendation + confidence

SKU
Recommendation
Conf.
Rationale
2
SKU-014
REPRICE
HIGH
12% above market comparables · unit cost verified
3
SKU-028
OPTIMIZE
HIGH
Title missing key fitment spec · image is stock render
4
SKU-033
BUNDLE
MED
Pairs naturally with SKU-009 on 62% of baskets
5
SKU-041
FIX CATALOG
HIGH
Mis-categorized · competes against its own parent line
6
SKU-055
REPOSITION
MED
Positioned for segment X · actually buys segment Y
7
SKU-067
DISCONTINUE
HIGH
Negative 2-yr margin · drop-ship available from supplier
8
SKU-082
OPTIMIZE
MED
Cross-reference data missing · invisible to search
93 more rows
SKU-014
REPRICE
Category AHIGH
12% above market comparables · unit cost verified
SKU-028
OPTIMIZE
Category AHIGH
Title missing key fitment spec · image is stock render
SKU-033
BUNDLE
Category BMED
Pairs naturally with SKU-009 on 62% of baskets
SKU-041
FIX CATALOG
Category CHIGH
Mis-categorized · competes against its own parent line
SKU-055
REPOSITION
Category DMED
Positioned for segment X · actually buys segment Y
SKU-067
DISCONTINUE
Category BHIGH
Negative 2-yr margin · drop-ship available from supplier
SKU-082
OPTIMIZE
Category EMED
Cross-reference data missing · invisible to search
93 more rows
REPRICEmarket-data driven
OPTIMIZEtitle · image · fitment
BUNDLEraise AOV
FIX CATALOGdata errors
REPOSITIONwrong buyer
DISCONTINUEonly if gated
7 quality gates
Hard gates. Any recommendation that fails any gate is rejected before it ships.

1
No repricing without verified unit cost
2
No discontinuation without 2-year margin check + catalog-completeness check + drop-ship alternative considered
3
No “investigate” without a concrete next step named
4
Confidence level graded against a rubric, not just a label
5
Every hero recommendation passes Level-2 adversarial fracture
6
No cannibalization with another recommendation in the matrix
7
Every claim has a verifiable source or is flagged as judgement
What ships is defensible.

The reusable part is the protocol itself. Any catalog business — hardware, niche apparel, specialty food, specialty auto parts, industrial distributor, home-and-garden — with a web presence and two comparable time windows of sales data can be run through it. The output is the same shape every time: a ranked top-hundred, a clustered research pass, a hero-level deep dive, an action matrix where every recommendation is backed by evidence.

If you run a catalog business and you know the tail is weak but can't pinpoint which SKUs or why, drop me a line. I can point the protocol at your catalog and come back with a ranked list and flags within a day.

Let's talk →

Related projects