EDS AI Copilot: Turning Manual QA Into a 30-Second Scan
$3.1M annual savings | 19× ROI with 10-month payback | 90% manual QA elimination | 15× team capacity scaling

Overview
Estee Lauder Companies' (ELC) Enterprise Design System (EDS) team was burning $600K a year on manual QA with zero strategic return. I built the business case, architecture, and prototype that turned that cost center into a $3.1M savings model, validated bottom-up against 200+ Jira tickets.
What the Evidence Supported
This project didn't ship to production during my tenure. In Q3 2026, ELC approved a $200K pilot budget based on the deliverables below.
1. The business case reframed Design Systems (DS) as a capital investment. The $3.1M model gave the VP of Experience Design a framework to present DS investment to the CTO using the same language as engineering infrastructure proposals. This was the first time a DS initiative at ELC was positioned as a capital proposal rather than a headcount request.
2. A Model Context Protocol (MCP) Server architecture that passed engineering review. Front-end engineers confirmed that the three-source integration pattern (Storybook, Confluence, Token Studio) was technically sound and that the canonical data layer was the right constraint model for enterprise governance.
3. A Jira-validated methodology structured for reuse. Extract operational data, categorize by task type, build the ROI bottom-up, present with an audit trail.
Background
- My Role
- Staff Product Designer, DS Lead: end-to-end ownership of business case, prototype design, ROI validation, and stakeholder documentation
- Duration
- Q3–Q4 2025 (6 weeks)
- Company
- Estée Lauder Companies (ELC)
- Team
- DS team of 3 designers supporting 20 product designers across 17 brand teams. Validated with the Design System Director, Design Excellence leadership, UX/UI designers, front-end engineers, and PM/Analytics.
- Context
- Expanded from an AI research assignment into a full business case with functional prototypes and stakeholder validation.
- Project Status
- $200K pilot approved in Q3 2026. Proposal advanced to CTO via VP of Experience Design executive review.
- Tools
- Figma Design, Figma Make, Figma MCP Server, Jira, Confluence
Personas
Three roles, each losing hours to the same root cause: every quality decision required a human in the DS team's queue.
- DS Designer: 60%+ of the day on manual Figma layer review, no time for architecture or governance. → Lint Scan, Generate Documentation
- Front-End Engineer: Hours lost to back-and-forth when handoff docs are missing or inconsistent. → Generate Documentation, Ask EDS
- E-Commerce UX/UI Designer: 3-5 day wait for DS validation on every design cycle. → Ask EDS, Lint Scan
The Challenge
ELC's EDS served hundreds of digital commerce and brand websites across 25+ global brands in 150+ markets. It was built as a component library, not an execution system. Every quality decision required a human in the loop: manual Figma layer review, Slack threads for questions, Jira tickets for resolution, and 3-5 day turnaround per cycle.
60% of DS capacity was consumed by this overhead. $600K+ annually in billable hours with no strategic return (50 releases × 160h manual QA × $75/hr). The VP of Experience Design needed AI efficiency metrics to demonstrate the team's strategic value to senior leadership.
Process
Methodology
Every number in the impact model traces to one source: Q3 2025 Jira analysis across 200+ tickets, categorized by QA, documentation, support, and prototyping task types, with time allocation validated against DS Director estimates. This bottom-up approach replaced the industry benchmarks I initially proposed after executives questioned their credibility.
Feature Design and Validation
Jira categorization determined feature scope. The three highest-volume task types became three features: Lint Scan (QA validation), Ask EDS (support questions), and Generate Documentation. I wrote a PRD for each. Lower-volume categories like brand onboarding automation were scoped out; they required org-level process changes beyond the pilot's mandate.
I built a high-fidelity Figma Make prototype with pre-populated ELC component data. Across three structured validation sessions, designers engaged with live prototype flows against real EDS components. One consistent observation: designers ignored the advisory-level results until all blocking violations were resolved, which validated the severity-tiered hierarchy and confirmed that collapsing advisory notes by default was the right interaction pattern. Front-end engineers flagged that the MCP Server's Confluence chunking strategy needed a versioning layer; I added a version-anchor requirement to the architecture spec before the executive review.
Cross-Functional Collaboration
The most consequential design decision came from a challenge, not from my own analysis. The Design System Director rejected my initial results hierarchy because it mirrored Figma's data model, not the team's triage workflow. I redesigned the entire hierarchy around severity tiers: blocking violations surface first with inline fix guidance, advisory notes collapse below. That conversation changed how I approach governance UX: start with the team's workflow, not the tool's data model (see Expanded: Lint Scan Results Hierarchy below).
I replaced ad-hoc Slack feedback with structured weekly reviews against three criteria: governance compliance, interaction completeness, and edge-case coverage. Finance challenged the blended hourly rate; I re-derived it from contractor billing data and it held within 3%. The VP of Experience Design pushed to reframe the narrative around capital investment rather than cost savings; I restructured the executive presentation around that lens. By presentation day, no number in the deck was unvetted. The VP of Experience Design approved the $200K pilot and advanced the proposal to the CTO.
Solution
System Architecture: Governance as the Foundation
Every feature expresses one architectural principle: EDS governance is enforced at the data layer, not in prompts, not in the UI, but in the mechanism that controls what the AI model can see and return. The MCP Server is the load-bearing element. Without it, every AI response is a hallucination risk. With it, non-EDS outputs are structurally impossible.
Design Decisions
Three core UX decisions shaped the product. Each resolved a tension between simplicity and trust.
| Decision | Options Considered | Tradeoff | Final Direction |
|---|---|---|---|
| Lint Scan Results Hierarchy | Flat list by component · Flat list by violation type · Severity-tiered hierarchy | Flat lists transfer triage to the designer: the cognitive overhead the tool was supposed to eliminate | Two-tier "Blocking vs. Advisory" hierarchy based on DS Director review of governance criteria |
| Ask EDS Response Format | Single answer, no attribution · Answer with source link · Answer with source + version anchor + confidence indicator | Simpler formats reduce friction but transfer the trust problem; an unverifiable answer undermines the premise | Answer + source attribution + version anchor. Low-confidence state surfaced when MCP can't ground the response |
| Generate Docs Scope | Comprehensive spec (all states, tokens) · Minimal spec (name + defaults) · Decision-point spec (default + divergences) | Comprehensive specs move filtering from generation to reading. Minimal specs shift clarification back to Slack | Three-tier decision-point scope: component identity, deviating states, brand overrides. Only what requires an engineering decision |
Expanded: Lint Scan Results Hierarchy
The central UX challenge wasn't the scan; it was what happens after. A complex Figma file can surface dozens of violations simultaneously, and the wrong hierarchy makes the tool unusable.
My initial design organized results by Figma component: each component listed its violations underneath, mirroring the layer panel's structure. I chose this because it matched the mental model of navigating a Figma file. The Design System Director rejected it in the first review. The problem: when a designer runs a scan, they need to know what to fix first, not which component to look at first. A component-based hierarchy transfers the triage decision to the designer, which is exactly the cognitive overhead the tool was supposed to eliminate.
The redesigned hierarchy organizes by severity. Blocking violations (wrong token applied, non-EDS component used, accessibility failure) surface at the top with inline fix guidance. Advisory notes (spacing within tolerance, style preferences) collapse below. In subsequent validation sessions, designers completed triage tasks faster because the hierarchy matched their actual workflow: fix what's broken, then address what's optional.
EDS Copilot
Three features, each targeting a specific bottleneck, each governed by the MCP Server:
- Lint Scan: Automated component validation in 30 seconds, replacing 3-5 day manual review cycles. Includes Accessibility Scan for automated contrast ratio checks and WCAG compliance with auto-fix suggestions.
- Ask EDS: Instant AI-powered answers grounded in canonical component and token data via the MCP Server. Replaces synchronous Slack threads with asynchronous self-service.
- Generate Documentation: 10-second spec generation, replacing the 4-6 hour manual process of screenshots, token copying, Confluence formatting, and notifications.
Plugin UX: Governance Feedback States
The system enforces governance at the data and validation layers, but what the designer actually sees is where those decisions surface. Each feature resolves to one of three Plugin UI states: a validated result, a blocked result caught by Backend API validation, or a low-confidence state when the MCP Server can't ground the request in canonical EDS data.
The three states use progressive visual weight to match the required user response. Validated results render in a compact, low-emphasis format so the designer's attention stays on their work. Blocked results use high-contrast error styling with an expanded detail panel, because a governance violation requires the designer to stop and act. The low-confidence state uses an amber treatment with a collapsible source-inspection panel, giving the designer the choice to proceed with caution or escalate to the DS team.
Technical Architecture
- Figma Plugin: Designer-facing interface for all three features, built inside Figma with no context switching
- MCP Server: Canonical data layer bridging Storybook, Confluence, and Token Studio. Full component manifest injected at request time
- Backend API: Brand-isolated authentication and output validation. Schema-checks all token IDs and component keys before results reach the Plugin
- Claude Sonnet 3.5: Powers Q&A, auto-fix, and documentation generation. $15/day cost ceiling at pilot-scale
- GitHub + CI/CD: Auto-syncs design tokens and triggers Storybook deployments. No net-new tooling required
Impact
All outcomes represent modeled results validated against Q3 Jira analysis. Five segments, each traced to a specific Jira task type, build the annual recurring figure. The largest segment (prototype acceleration, $2.05M) depends on adoption rate, which I couldn't validate pre-launch. I modeled three scenarios against the $165K investment: 25% adoption ($1.1M, 5.7× ROI), 50% adoption ($1.9M, 10× ROI), and 75% adoption ($2.5M, 14× ROI). Even the conservative floor clears the investment by 5×. The 50% tier was the target I presented, based on ELC's Figma migration hitting roughly 60% active usage within 90 days of structured onboarding.
A chatbot that can answer questions and tell people how to use a component? That is scalable to a thousand designers across the world. That's a much bigger thing.
Planned Measurement Framework
For unshipped work, the measurement plan is the credibility proof. I designed a 2-brand pilot instrumented on three signals:
- Lint scan adoption rate: Scans per active Figma file per week, targeting 3+ scans per file within 30 days of rollout.
- Ask EDS query volume vs. Slack DS support threads: Target 50% reduction in synchronous support requests within 60 days.
- Support ticket deflection rate: Percentage of Jira DS-support tickets avoided, measured against the Q3 baseline.
These three signals map directly to the three largest savings segments in the ROI model. If the pilot validates the 50% adoption tier, the business case supports full 25-brand rollout.
Learnings
When I initially presented industry benchmarks, the room pushed back on credibility. When I rebuilt the model from 200+ internal Jira tickets categorized by task type, the same executives approved a $200K pilot. The lesson: how you prove the number matters as much as the number itself.
When the Design System Director and the VP of Experience Design had conflicting concerns about governance, the architecture that satisfied both was the right one. I now design governance as a user-facing feature, not a background policy. On this project, that meant the MCP Server's data layer enforced compliance automatically, so designers never had to think about it and leadership never had to police it.
