logo
Download
languageENdown
menu
Anonymized Case Study - Automotive Aftermarket Parts

How an Automotive Parts Brand Validated a Managed AI Visual Product Matching WorkflowFrom noisy marketplace listings to structured visual matching outputs.

A defined-scope paid POC tested whether AI-driven competitor retrieval, pre-vision filtering, visual product matching, and structured output delivery could become a stable managed workflow across 200 agreed products.

200 agreed products~2-week managed POCAI retrieval + filtering + visual matchingStructured outputs for evaluation
View Sample Dataset

Client identity is anonymized. This page describes a commercial-grade validation workflow, not a full production deployment and not a self-serve software rollout.

200
Agreed products in the defined-scope paid POC
~2weeks
Managed timeline after kickoff and usable input receipt
90%+
Targeted output accuracy under the agreed evaluation framework
1,000
Rows in the public sanitized workflow preview dataset
What this case study shows

An automotive aftermarket parts brand used Octoparse for a defined-scope paid POC to test a managed AI visual product matching workflow across 200 agreed products. Octoparse handled candidate retrieval, pre-vision filtering, visual comparison, QA, and structured output delivery. A representative workstream narrowed noisy candidate pools into higher-quality visual matching inputs. The case shows why visual AI matching requires workflow design, not just a model. It supports the AI Visual Product Matching Service inside Octoparse Managed Data Service.

Business challenge

The challenge: retrieval was easy. Reliable visual matching was not.

Candidate pools were noisy

Public marketplaces can return many candidate rows, but the raw pool often includes duplicates, irrelevant listings, wrong fitment, wrong part types, and low-quality listing structure.

Wrong part type created false positives

Headlights, lamps, grilles, lips, splitters, brackets, reinforcement bars, and accessories can appear in search results even when they are not valid matches for target body parts.

Reference images had to be usable

Mounted or installed images are weaker baselines. Part-only product images provide cleaner geometry for reliable visual comparison.

Fitment alone could not decide visual match

Correct make, model, and year are useful context, but they cannot prove a match if the candidate is the wrong part type or wrong style.

AI cost and noise had to be controlled

Not every candidate should be sent to visual AI. Candidates need retrieval logic, first-layer filtering, and final gating before image comparison.

Collection quality depended on source behavior

Incomplete titles, inconsistent images, anti-bot restrictions, unstable layouts, and low-quality listing structure all affected how candidate data had to be collected and prepared.

Why a managed POC

Why a managed POC made more sense than a self-serve tool

The client needed to validate commercial readiness, not operate a tool. Octoparse managed collection, filtering, tuning, matching, QA, and final output delivery while the scope stayed clearly bounded around 200 agreed products.

  • Designed to reduce uncertaintyThe engagement tested whether the workflow could support a larger next-stage program without presenting the POC as a production rollout.
  • Managed by OctoparseClient inputs were converted into retrieval rules, pre-vision gates, matching logic, and structured outputs.
  • Commercial-grade validationThe objective was to target 90%+ output accuracy under an agreed evaluation framework without claiming final verified accuracy here.
POC boundaries

What the engagement was, and was not

Defined-scope paid POC

Focused on 200 agreed products with a managed workflow and structured evaluation outputs.

Not a production deployment

The project validated workflow readiness and delivery shape before a broader next-phase engagement.

Not a self-serve software rollout

Octoparse owned the operational work: retrieval, filtering, tuning, matching, QA, and final delivery.

Managed workflow

What the managed workflow included

The POC was scoped as a repeatable workflow: input preparation, public web candidate retrieval, filtering, final gating, visual comparison, and structured output delivery.

Step 1

Input structuring

Product identifiers, titles, part type, fitment, URLs, reference images, and search guidance were structured before collection began.

Step 2

Candidate retrieval

Octoparse collected public competitor listing candidates using RPA-driven and search-guided retrieval.

Step 3

First-layer filtering

Duplicates, wrong part types, wrong fitment, irrelevant listings, generic rows, and unusable images were removed before deeper matching.

Step 4

Final pre-vision gate

High-priority candidates were selected, reserve candidates were preserved, and rows without usable part-only images were blocked.

Step 5

AI visual matching and structured output

Surviving candidates were compared by physical geometry and delivered as matches, rejects, low-confidence items, and structured summaries.

Before visual AI

What had to be fixed before visual AI could work reliably

Workflow correction

Reference image correction

Part-only product images were treated as stronger visual baselines. Mounted or installed photos were excluded or marked weaker unless geometry was clearly visible.

Workflow correction

Image rule correction

Reference image rules had to be adjusted so valid front-facing part-only images were not incorrectly excluded before visual comparison.

Workflow correction

Wrong-part cleanup

Lighting, lamp, accessory, and wrong-part filters were strengthened before visual matching to reduce false positives.

Workflow correction

Candidate cap and reserve strategy

The first visual pass focused on high-priority candidates while preserving reserve candidates for second-pass review when needed.

Operational proof

How Octoparse narrowed noisy candidates before visual AI

This narrowing process shows why managed AI visual matching is not simply "send all images to AI." The quality of pre-vision gating directly affects cost, review burden, and match quality.

Representative front bumper workstream

Raw retrieval was narrowed before visual AI.

This representative workstream shows how retrieval volume was reduced into final vision inputs and reserve candidates before image matching.

2,500raw candidate rows
50front bumper products
528final vision input rows
660secondary reserve rows
1,312pre-vision rejects
Final Pre-Vision Gate

Only candidates with usable visual inputs moved forward.

The final gate separated candidates ready for visual matching from reserve rows, rejects, and records blocked by missing part-only images.

725candidates started with part-only references
391final vision input candidates
178secondary reserve candidates
156final gate rejects
180blocked for missing part-only images
30product groups ready for visual match
Public dataset preview

Preview the anonymized visual matching workflow dataset

Octoparse prepared a public-facing sanitized workflow preview showing how candidate retrieval, visual matching output buckets, and product-level summaries can be structured in a managed AI visual product matching engagement. The preview is not raw client data and is not a model benchmark.

1,000-row candidate output sample

  • high-confidence matches
  • needs-review candidates
  • declined candidates
  • masked product, seller, candidate, and brand identifiers

Product-level summary

  • match, review, and decline counts
  • score buckets
  • best-candidate indicators

Field dictionary

  • public-safe field definitions
  • sanitization notes
  • workflow interpretation for technical buyers
sample_idclient_product_id_maskedsource_typevisual_match_statusoutput_bucketdecision_reason_category
VPM_SAMPLE_0001CLIENT_PRODUCT_0026ebay_like_marketplacematched_80_plusgold_matchaccepted_high_confidence
VPM_SAMPLE_0099CLIENT_PRODUCT_0001ebay_like_marketplaceneeds_review_60_79needs_reviewambiguous_or_partial_visual_match
VPM_SAMPLE_0493CLIENT_PRODUCT_0003ebay_like_marketplacedeclined_below_60declinedwrong_part_or_accessory
VPM_SAMPLE_0840CLIENT_PRODUCT_0057ebay_like_marketplacedeclined_below_60declinedfitment_or_application_issue
VPM_SAMPLE_0998CLIENT_PRODUCT_0020ebay_like_marketplacedeclined_below_60declinedimage_access_or_quality_issue

Dataset note: This dataset is a public-facing sanitized workflow preview. It is not raw client data, not a complete marketplace crawl, and not a model benchmark. View the anonymized AI visual matching workflow dataset on Hugging Face.

View Dataset on Hugging Face
Visual rules

The visual rules behind the workflow

Visual AI is only useful when the workflow tells it what to compare, what to ignore, and when to stop scoring.

Geometry first

Use product images to compare physical shape. Text is context only and cannot override the visual baseline.

Full part type required

A valid candidate must be the same target part type, not a nearby accessory or adjacent vehicle component.

Fitment is context, not proof

Correct make, model, and year can support a match, but they do not prove visual equivalence by themselves.

Ignore material and color

Primer, carbon fiber, FRP, ABS, gloss, matte, watermarks, backgrounds, and lighting should not decide match status.

Reject accessories

Headlights, fog lights, grille inserts, lips, splitters, brackets, side markers, mounts, covers, and accessories are rejected when not the target part.

Do not use price as a match signal

Price can be delivered as metadata, but it should not be used to accept or reject visual product similarity.

Flag image access issues

If images are inaccessible or unusable, the workflow flags the issue instead of inventing a visual score.

Installed photos lower confidence

Mounted vehicle photos are weaker unless the relevant part geometry is clearly visible and comparable.

Text mismatch requires review

OEM or generic text mismatches should be rejected or reviewed unless the geometry is truly similar.

Delivery and outputs

What the client received

  • matched candidate results
  • rejected candidate results with reasons
  • low-confidence review items
  • visual similarity or style-clustering output where applicable
  • edge-case observations
  • structured spreadsheet or tabular outputs
Output buckets

How results were organized for review

BucketPurpose
Golden MatchesHigh-confidence candidates accepted for evaluation and downstream use
Vision Reject PileCandidates rejected after filtering, visual comparison, or decision-rule checks
Low Confidence ReviewAmbiguous candidates surfaced for human review or second-pass analysis
Image Access IssuesRows where source images were missing, blocked, inaccessible, or unusable
Product SummaryProduct-level status, bucket counts, best candidates, and review indicators
What the POC validated

The project validated the workflow, not a shortcut.

AI visual product matching can be operationalized as a managed workflow.

Candidate retrieval and pre-vision filtering matter as much as the model itself.

Part-only reference images materially improve visual baseline quality.

Wrong-part filtering reduces false positives before AI scoring.

Structured outputs make results easier to evaluate and improve.

The project created a practical basis for broader next-stage execution.

FAQ

Questions teams ask before scoping a visual matching POC

What is AI visual product matching?
AI visual product matching compares product images and product context to identify whether two marketplace listings represent the same or comparable item. In a managed workflow, image comparison is combined with candidate retrieval, filtering, image quality checks, output buckets, and structured review logic.
Why is pre-vision filtering necessary before AI matching?
Pre-vision filtering removes duplicates, wrong part types, wrong fitment, low-quality images, and irrelevant listings before visual AI is used. This reduces cost, lowers review burden, and gives the model cleaner inputs for physical geometry comparison.
How does Octoparse reduce wrong-part matches?
Octoparse applies rule-based gates before visual comparison. For automotive parts, this includes rejecting accessories, lamps, brackets, splitters, grilles, and other items that are not the target part type, even when the title or fitment looks related.
Why do part-only reference images matter?
Part-only images create a stronger visual baseline because the target geometry is easier to compare. Mounted vehicle photos, cropped images, or cluttered listing images can still be reviewed, but they usually lower confidence unless the part shape is clearly visible.
Can AI visual matching work when listing titles are incomplete?
Yes, but title text should be treated as context, not proof. A managed workflow can use incomplete titles to retrieve candidates, then rely on filtering, part-type logic, reference image quality, and visual comparison to decide match status.
What does Octoparse deliver in a visual matching POC?
A visual matching POC can include structured candidate outputs, matched results, rejected results with reasons, low-confidence review items, visual matching or style-clustering outputs where applicable, and summary observations on recurring edge cases.
Is this a self-serve tool or a managed service?
This case study describes a managed service workflow, not a self-serve software rollout. The client provides inputs and requirements, while Octoparse manages retrieval, filtering, tuning, matching, QA, and structured output delivery.
Can this workflow apply outside automotive parts?
Yes. The same managed workflow pattern can support furniture, appliances, apparel, beauty, industrial products, and other categories where exact identifiers are missing and visual similarity must be evaluated with business rules.
What is included in the public Hugging Face sample dataset?
The Hugging Face dataset is a sanitized workflow preview with a 1,000-row candidate output sample, product-level summaries, and a field dictionary. It shows how output buckets and decision reasons can be organized for review and downstream use.
Is the Hugging Face dataset raw client data?
No. The dataset is not raw client data and not a benchmark dataset. It is a public-facing sanitized workflow preview with masked product, seller, candidate, and brand identifiers, direct URLs removed, prices bucketed, and decision reasons generalized.

Validate visual product matching before building it in-house.

If your team needs to compare products visually across marketplaces, competitor sites, or large catalogs, Octoparse can scope a managed POC around your inputs, sources, match criteria, and delivery requirements.

View Sample Dataset
Dataset