FMEA (Failure Modes & Effects Analysis)

FMEA (Failure Modes & Effects Analysis)

FMEA (Failure Modes & Effects Analysis)

What is FMEA?

Failure Modes & Effects Analysis (FMEA) is a structured, proactive method to identify how a product, process, or service can fail (failure modes), what those failures can cause (effects), why they might happen (causes), and how we can detect or prevent them (controls). It converts engineering intuition into a living risk register with clear actions and owners.

Why FMEA still matters today

Products are smarter, supply chains are global, tolerances are tighter, and customers are less forgiving. FMEA ties risk thinking directly to design and process decisions so teams prevent defects and incidents rather than chasing them later. It’s a cornerstone in quality, reliability, and safety programs across manufacturing, services, healthcare, and software.


Key Concepts You Must Know

Failure mode, effect, cause, and controls

  • Failure mode: How something can fail (e.g., bolt loosens).
  • Effect: What happens if it fails (e.g., bracket detaches; noise; safety hazard).
  • Cause: Why it fails (e.g., insufficient torque; no thread-locker).
  • Current controls: What’s in place today to prevent or detect it (e.g., torque spec + verification; Poka-Yoke).

Severity, Occurrence, Detection (S/O/D)

  • Severity (S): Impact if the failure reaches the user/process (1 = negligible, 10 = hazardous).
  • Occurrence (O): Likelihood of the cause happening (1 = remote, 10 = frequent).
  • Detection (D): Likelihood current controls will catch the issue before impact (1 = almost certain detection, 10 = impossible to detect).

RPN vs Action Priority (AP)

  • RPN = S Ɨ O Ɨ D has been widely used, but it can mask high Severity risks.
  • Action Priority (AP) classifies combinations of S/O/D into High (H), Medium (M), Low (L) priority, elevating attention to severe items even if O or D are low.

Types of FMEA

System FMEA (SFMEA)

Looks at interactions between subsystems. Useful when complex interfaces or dependencies can create emergent failures.

Design FMEA (DFMEA)

Analyzes product design at component/feature level. Targets material selection, geometry, tolerances, interfaces, and environmental stresses.

Process FMEA (PFMEA)

Focuses on manufacturing/operational steps—setup, machining, assembly, packaging, service delivery—so defects are prevented at the source.

Service/Software FMEA

Applies the same logic to service journeys and software modules (e.g., API call fails, timeout, stale cache). For software, pair with robust testing strategies.

FMECA (with Criticality)

Adds criticality analysis (e.g., failure rate Ɨ severity) to quantify mission/safety impacts—common in aerospace, defense, and medical devices.


When to Use FMEA

  • New design or process: Start in concept; refine as details mature.
  • Change management: New suppliers, materials, equipment, or specs.
  • After field issues or near misses: Learn once, fix forever.
  • Periodic review: Treat FMEA as a living document, updated with data.

Building the Right Team and Inputs

Cross-functional roles

  • Facilitator/Quality: Guides method, maintains consistency.
  • Design/Process Engineer: Technical details and feasibility.
  • Production/Service: Real-world practicality and controls.
  • Supplier/Procurement: Material/process capability insights.
  • Maintenance/Reliability: Failure data and preventive strategies.
  • HSE/Regulatory: Safety, compliance, and legal implications.

Data sources and evidence

  • Past NCs, field returns, FRACAS logs
  • Process capability (Cp/Cpk), SPC charts
  • Warranty/complaint analytics, Pareto charts
  • Standards, test reports, design calculations
  • Control plans, work instructions, checklists

The AIAG–VDA 7-Step FMEA Method

Step 1: Planning & Preparation

Define scope, boundaries, assumptions, and interfaces. Agree on team, timeline, document controls, and rating scales.

Step 2: Structure Analysis

Break the system/design/process into a structure tree: system → subsystem → components or process → operations → tasks.

Step 3: Function Analysis

List intended functions and requirements for each element (including regulatory and safety). Clarify performance ranges and environmental conditions.

Step 4: Failure Analysis

Identify failure modes (how function can fail), effects (what happens), and causes (why it happens). Tie each to the structure and function steps.

Step 5: Risk Analysis (S/O/D + AP)

Rate S, O, D using agreed criteria. Use Action Priority (H/M/L) to decide which items require action now.

Step 6: Optimization (Actions & Ownership)

Define recommended actions (preventive first, then detection), assign owners and due dates, and track to closure. Re-rate S/O/D after implementation.

Step 7: Results Documentation & Handover

Summarize residual risks, update control plans, work instructions, training, and PPAP (if automotive). Archive decisions with version control.


Rating Scales (S/O/D) — Practical Guidance

How to define 1–10 for S/O/D

Customize scales to your industry but keep consistency. Example snippets:

  • Severity (S):
    • 10: Safety hazard; regulatory violation; system failure without warning
    • 7–9: Major function loss; scrap; critical downtime
    • 4–6: Performance degradation; rework; customer annoyance
    • 1–3: Minor inconvenience; negligible effect
  • Occurrence (O):
    • 10: Failure cause is almost inevitable with current controls
    • 7–9: Frequent under normal conditions
    • 4–6: Occasional; seen in some lots/runs
    • 1–3: Remote; strong capability and controls
  • Detection (D):
    • 10: No detection method; will escape to user
    • 7–9: Weak checks; manual inspection; low detection power
    • 4–6: Effective in-process checks; automated measurement
    • 1–3: Poka-Yoke/100% automatic detection; proven to catch escapes

Consistent rating with examples

Use evidence: measured PPM, Cpk, gage R&R, test coverage, audit results. Document why the team chose a number.


Action Priority (AP) vs RPN

Why AP improves decision-making

RPN can undervalue severe risks (e.g., S=10, O=2, D=2 gives RPN=40 and might look ā€œlowā€). AP ensures S drives action even when O or D are low.

How to read H/M/L

  • H (High): Take action or strong justification required.
  • M (Medium): Action beneficial; prioritize based on resources.
  • L (Low): Action optional; monitor with data.

Worked Example: Mini PFMEA

Scenario

Operation: Torque a critical M8 fastener on an assembly line.

Process StepFunction/Req.Failure ModeEffectCauseCurrent Controls (Prev/Det)SODAP
Torque M8 fastenerAchieve 22±2 NmUnder-torqueBracket loosens; vibration; safety riskWorn tool; operator skips re-hit; no thread-lockerPM on tools (Prev); final visual check (Det)947H
Torque M8 fastenerAchieve 22±2 NmOver-torqueStud yield; crack over timeWrong torque programProgram selection checklist (Prev)736M
Apply thread-lockerBond threadsMissing thread-lockerLoosening in fieldEmpty dispenser; untrained tempKanban refill (Prev); UV check (Det)856H

Recommended actions (examples):

  • Add Poka-Yoke torque gun with auto program selection and OK/NOK lockout.
  • Introduce in-line torque monitoring with curve signature.
  • UV camera to auto-detect thread-locker.
  • Increase tool PM frequency; gage torque analyzers monthly.

Post-action re-ratings: Expect O and D to drop significantly, moving AP from H → M/L.


Linking FMEA to Control Plans & PPAP

High-risk items become special characteristics in the control plan with specific reaction plans, sampling, and capability targets. For automotive suppliers, the FMEA underpins PPAP submissions, proving prevention and detection are robust before SOP.


Standards & Best-Practice Alignment

  • ISO 9001 / ISO 13485: Risk-based thinking and documented evidence.
  • IATF 16949 (Automotive): Requires DFMEA/PFMEA alignment with AIAG–VDA handbook.
  • IEC 60812: International guidance on FMEA methodology.
  • Functional Safety (e.g., ISO 26262): Use FMEA with FTA, FMEDA, and safety goals.

Tools, Templates & Digitalization

Spreadsheet vs specialized tools

  • Spreadsheet (Excel/Sheets): Fast to start, easy to share; risk of version chaos.
  • Specialized software: Structure trees, libraries of failures/controls, AP logic built-in, change tracking, dashboards.

Data hygiene tips

  • Use a single source of truth with version control.
  • Create libraries for common components, failure modes, and controls.
  • Link FMEA rows to work instructions, test plans, and control plans.

Common Mistakes & How to Avoid Them

  1. Treating FMEA as paperwork → Tie actions to owners/dates, review monthly.
  2. Overreliance on RPN → Use Action Priority and emphasize Severity.
  3. Vague failure modes → Make them function-specific and measurable.
  4. Copy-pasting old FMEAs → Start from history, but tailor to the new context.
  5. Inflated Detection ratings → Validate detection with evidence and MSA.
  6. No field feedback loop → Feed returns/complaints back into re-ratings.
  7. Missing cross-functional voices → Include production, supplier, service, HSE.
  8. Actions not closed → Track closure rates; escalate overdue actions.
  9. Ignoring human factors → Add training, ergonomics, UI/UX controls.
  10. One-time exercise → FMEA must be living—update after changes and findings.

Advanced FMEA Approaches

  • Fuzzy FMEA: Uses fuzzy logic for uncertain ratings; helpful with sparse data.
  • FTA + FMEA combo: Use Fault Tree Analysis top-down with FMEA bottom-up.
  • FMECA: Adds criticality calculations for mission and safety critical systems.
  • FMEDA (for electronics): Quantifies diagnostic coverage and failure rates to support functional safety analyses.

Implementation Roadmap

  1. Kickoff & Train: Align on method, scales, AP logic.
  2. Pilot: Choose one design and one process; prove value quickly.
  3. Standardize: Templates, libraries, naming conventions, revision control.
  4. Integrate: Connect FMEA with APQP, NPI gates, ECR/ECN, CAPA.
  5. Govern: Establish review cadence, risk boards, and management oversight.
  6. Scale: Roll to suppliers; audit for consistency and effectiveness.

KPIs & Continuous Improvement

  • Risk reduction velocity: # of H-AP items moved to M/L per quarter.
  • Action closure rate: % of actions closed on time.
  • Escape rate / PPM: Trend of customer and in-process defects.
  • Field reliability: MTBF/MTBR improvements linked to FMEA actions.
  • Audit scores: Conformance to AIAG–VDA structure and evidence depth.

Conclusion

FMEA turns scattered worries into a clear, prioritized action plan. By rigorously defining failure modes, effects, causes, and controls—and by rating S/O/D with Action Priority—your team can focus energy where it matters most. Keep the document alive, link it to control plans and change management, and close actions fast. Done right, FMEA doesn’t just reduce defects and downtime; it builds confidence across engineering, operations, and your customers.

External Resource: ASQ – What is FMEA?

Hierarchy of Controls in Risk Assessment

Designing a 3Ɨ3 vs 5Ɨ5 Risk Matrix

Gas Cylinder Handling JSA – Job Safety Analysis

Tower Crane Operation JSA – Job Safety Analysis

Forklift Operation JSA – Job Safety Analysis


FAQs

1) What’s the difference between DFMEA and PFMEA?

DFMEA focuses on the product design (materials, dimensions, interfaces), while PFMEA targets the manufacturing/operational process (machines, methods, measurements, environment, people).

2) Should I still calculate RPN if we use Action Priority?

You can, but AP should drive decisions because it elevates Severity-driven risks. Many organizations keep RPN for trending while using AP for action triggers.

3) How often should we update our FMEAs?

At every significant change, after major defects/returns, and at planned intervals (e.g., quarterly reviews). Treat it as a living document.

4) What evidence do auditors look for in FMEA?

Clear traceability: why ratings were chosen, what controls exist, the actions taken, who owned them, and how residual risk was re-rated and integrated into control plans or work instructions.

5) Can FMEA work for services and software?

Absolutely. Map the service journey or software architecture, define functions and potential failures (timeouts, data loss, usability), and apply the same S/O/D logic with appropriate controls (tests, monitoring, canary releases).

LEAVE A REPLY

Please enter your comment!
Please enter your name here