Freight & 3PL

Replacing slow warehouse label entry without leaking shipment data

A 3PL freight operation, 100+ inbound boxes/day, multilingual lane (US ↔ China, Mexico). · Shipped February 2026.

A 3PL freight operation was spending roughly $30k a year retyping information that was already printed on shipping labels. The obvious OCR shortcut was not allowed: shipment data had to stay inside the warehouse.

The situation

Every inbound box — 100+ per day — had a shipping label with sender, recipient, hazmat flags, Incoterms, weight, and dimensions. A warehouse worker typed each field into the WMS. Two to five minutes per box. Mistakes bled into customs disputes.

At 100 boxes a day, the math was ugly: roughly $30,000 a year in labor spent on retyping information that was already printed on the label.

What we built

A local AI pipeline running on commodity hardware in the warehouse:

Qwen 3.5-VL for multilingual label OCR (Chinese, Spanish, English common in this lane).
Hybrid barcode + vision — if the barcode tells the truth, take it; otherwise read the printed fields.
Weight and photo capture through an iOS companion app on the picking station.
Bubble.io WMS sync so the data lands where the rest of the ops tooling expects it.
Correction learning loop — when a human edits a field, the model's mistake is logged and factored into the next fine-tune.

On-premise. No per-box API fees.

The engagement

Phase 1 was a fixed-price, six-week build: $54k, scoped against a clear before/after baseline. Hardware specs, competitive analysis, and an ROI worksheet went to the client before they signed.

Outcome

About 85% time reduction — under 30 seconds per box, end-to-end.
About $25k a year in labor savings, paying back the build inside roughly two years.
100% on-prem — compliance for the international shipments stays tight.

What surprised us

For the first three days after rollout, picker speed actually went down. The warehouse team had been blamed for typos for years, and the model's "low-confidence" flags felt like one more system second-guessing them. We sat on the floor for an afternoon and watched. The fix wasn't the model — it was the UI. We added a "verify mode" toggle so each picker could decide when to trust the auto-fill on their own timeline. By week two, every toggle was off. The lesson: the model was ready before the trust was. We had been measuring the wrong thing.

The warehouse doesn't care if the model is state-of-the-art. It cares that the picker can keep moving. We measured success in boxes per hour, not benchmarks.

Names and identifying details anonymized. Metrics reflect projections or realized outcomes at the time the engagement was scoped.