MTBF and MTTR Explained: Maintenance KPIs Every SA Operation Should Track
If you run a mine, a factory, or a facility in South Africa, you already know that unplanned downtime costs more than labour and parts — it costs production, safety, and compliance. The question is how to measure whether your maintenance is actually improving. MTBF and MTTR are two of the most widely used maintenance KPIs for exactly that: they tell you how often equipment fails and how long it takes to fix. Used together with OEE and a few other metrics, they give you a clear picture of equipment reliability and where to focus next.
This guide explains how to calculate MTBF and MTTR, what good looks like, and how maintenance impacts broader metrics like OEE. We use concrete South African examples — a mining hoist, a packaging line, and commercial HVAC — and show how a CMMS can track these KPIs automatically so you spend less time on spreadsheets and more time on improvement.
Why Maintenance KPIs Matter
Without numbers, maintenance stays subjective. One shift says the plant ran well; another says everything broke. Auditors and management want evidence: are failures going down? Is repair time improving? Are we doing enough preventive work? Maintenance KPIs turn that evidence into a shared language. They help you:
- Prioritise — Focus on assets or failure types that hurt the most.
- Benchmark — Compare your operation to industry norms and to your own past performance.
- Justify investment — Show the cost of reactive work and the payoff of preventive programs.
- Prove compliance — Demonstrate to MHSA and OHS Act inspectors that you maintain equipment in a safe, planned way.
MTBF and MTTR sit at the centre of that picture. They are simple to define and, with consistent record-keeping, straightforward to calculate. The rest of this article shows you how.
MTBF Explained: Mean Time Between Failures
MTBF (mean time between failures) is the average operating time between one failure and the next. It answers: How long does this asset typically run before it fails? Higher MTBF means more reliable equipment or better maintenance; lower MTBF means failures are happening too often.
The MTBF formula
[ \text{MTBF} = \frac{\text{Total operating time}}{\text{Number of failures}} ]
Total operating time is the sum of all running hours (or kilometres, cycles, or production units) between the start of the period and each failure. You count only the number of distinct failures in that period. If the same asset fails three times in a month, that is three failures.
Example: conveyor belt at a South African packaging plant
A packaging line in KwaZulu-Natal runs one main conveyor that feeds the filler. Over a quarter:
- The conveyor runs 1,920 hours (three shifts, five days, minus planned stoppages).
- It fails four times: two bearing seizures, one belt slip that damaged a splice, and one drive motor overload.
[ \text{MTBF} = \frac{1{,}920 \text{ hours}}{4 \text{ failures}} = 480 \text{ hours} ]
So the conveyor runs an average of 480 hours between failures — about 12 working weeks. If the plant wants to reduce unplanned stoppages, the next step is to look at why those four failures happened (e.g. lubrication schedule, alignment, or load) and tighten preventive maintenance or condition checks so MTBF improves.
Note: MTBF is most meaningful for repairable assets. For items that are replaced on failure (e.g. a light bulb), the equivalent metric is often MTTF (mean time to failure). In maintenance discussions, MTBF is used so often that it has become the default term for “average run time between failures.”
MTTR Explained: Mean Time To Repair
MTTR (mean time to repair) is the average time from when a failure occurs until the asset is back in operation. It answers: How long does it usually take us to fix it? Lower MTTR means faster recovery and less downtime per failure; it reflects your repair capability, spare parts availability, and technician skills.
The MTTR formula
[ \text{MTTR} = \frac{\text{Total repair time}}{\text{Number of repairs}} ]
Total repair time is the sum of all clock time (or labour time, depending on how you define it) spent on each repair. Number of repairs is the same count you used for MTBF. Consistency is important: decide whether you measure from “failure reported” to “back in production” or from “work started” to “work completed,” and stick to it.
Example: mining hoist brake fault
A mine in Limpopo tracks hoist downtime. In one month the hoist had three unplanned stoppages for brake-related faults. Repair times (from fault to back in service) were 2.5 hours, 4 hours, and 3 hours.
[ \text{MTTR} = \frac{2.5 + 4 + 3}{3} = \frac{9.5}{3} \approx 3.17 \text{ hours} ]
So the average time to repair this type of fault is about 3.2 hours. If the mine wants to reduce MTTR, it can look at: faster diagnosis (checklists, fault codes), critical spares on site, and technician training so brake adjustments and part swaps are done correctly the first time. For MHSA-critical equipment like a hoist, reducing MTTR also reduces exposure time when the asset is in a failed state.
The Relationship Between MTBF and MTTR
MTBF and MTTR work together. MTBF tells you how often you are in a repair situation; MTTR tells you how long each repair takes. Both feed into availability:
[ \text{Availability} = \frac{\text{MTBF}}{\text{MTBF} + \text{MTTR}} ]
So improving either MTBF (fewer failures) or MTTR (faster repairs) improves availability. In practice:
- High MTBF, high MTTR — Equipment fails rarely but when it does, downtime is long. Focus on repair speed, spares, and procedures.
- Low MTBF, low MTTR — Failures are frequent but fixes are quick. Focus on root cause and preventive work to increase MTBF.
- Low MTBF, high MTTR — The worst case: frequent failures and slow repairs. You need both better prevention and better repair capability.
For South African operations dealing with load-shedding, skills shortages, or remote sites, MTTR can spike when spares are not on site or when the right technician is not available. Tracking MTTR by failure type and by site helps you see where to stock parts and where to invest in training or preventive vs reactive strategies.
OEE Explained: How Maintenance Fits In
OEE (Overall Equipment Effectiveness) is a single percentage that combines availability, performance, and quality. It is common in manufacturing and mining to measure how well an asset or line is used.
[ \text{OEE} = \text{Availability} \times \text{Performance} \times \text{Quality} ]
- Availability — Uptime as a share of planned production time. Downtime (planned and unplanned) reduces it. Maintenance directly affects availability: fewer failures (higher MTBF) and shorter repairs (lower MTTR) both increase it.
- Performance — Actual output vs theoretical maximum when the asset is running. Slow cycles, minor stoppages, and running below design speed reduce performance. Maintenance affects this when poor condition (e.g. worn belts, dirty filters) forces the line to run slow or when setup and changeover are inefficient.
- Quality — Good output as a share of total output. Rework and scrap reduce quality. Maintenance can affect quality when equipment drift (e.g. misalignment, worn tools) causes defects before a formal failure occurs.
So maintenance impacts all three legs of OEE. Improving MTBF and MTTR raises availability; condition-based and preventive work can protect performance and quality. When you report OEE to management, being able to break it down into availability (and thus MTBF/MTTR), performance, and quality makes it clear where maintenance contributes and where production or process changes are needed. For example, a commercial building in Gauteng tracking its HVAC plant might see availability drop when chillers trip or filter blockages force shutdowns; performance drop when dirty coils or low refrigerant reduce cooling capacity; and comfort complaints (a proxy for quality) when maintenance is deferred. In that context, MTBF and MTTR on critical AHUs and chillers directly feed into how well the facility performs.
Other Important Maintenance KPIs
Beyond MTBF and MTTR, these metrics help you run a disciplined maintenance operation.
PM compliance (preventive maintenance compliance %)
[ \text{PM compliance} = \frac{\text{PM tasks completed on time}}{\text{PM tasks due}} \times 100 ]
This is the percentage of scheduled preventive tasks that are done by their due date. Low PM compliance usually means more reactive work and lower MTBF. World-class operations often target above 90%.
Planned vs unplanned ratio
[ \text{Planned ratio} = \frac{\text{Planned work hours}}{\text{Total maintenance work hours}} \times 100 ]
A high planned ratio (e.g. above 80%) means most work is scheduled and controlled rather than firefighting. It correlates with better MTBF and lower cost per repair.
Cost per asset
Total maintenance cost (labour + parts + contractors) for a period, divided by the number of assets or by asset criticality. It helps identify which assets consume the most budget and whether that spend is preventive or reactive.
Wrench time
The share of technician time spent on actual repair or PM work vs travel, admin, waiting for parts, and searching for information. Low wrench time is a signal to improve planning, spares availability, and how work is managed in a CMMS.
Benchmarks: What Good Looks Like
Benchmarks vary by industry and asset type, but these ranges are often cited:
| KPI | Good | World-class |
|---|---|---|
| PM compliance | 80–85% | > 90% |
| Planned work ratio | 70–80% | > 80% |
| MTBF | Trend improving; compare to similar assets | Best-in-class plants track by asset class |
| MTTR | Trend decreasing; compare to history | Depends on complexity and spares strategy |
For South African mining, manufacturing, and facilities, the first step is to start measuring consistently. Even if your numbers are below “world-class,” having a baseline lets you set realistic targets and show year-on-year improvement to management and auditors.
How to Improve These KPIs
Stronger PM programs
More and better preventive maintenance is the main lever for higher MTBF. Define PMs from OEM guidance, regulation (e.g. MHSA, OHS Act), and failure history. Schedule them in a CMMS so work orders are generated automatically and compliance is tracked.
Root cause analysis
When the same asset or failure type repeats, do a simple root cause analysis. Fix the underlying design, procedure, or training issue so MTBF improves instead of repeating the same repair.
Spare parts availability
Critical spares on site (or under a clear replenishment process) reduce MTTR. Use work order and failure data to identify which parts are needed most and hold stock or agreements for those.
Training and procedures
Trained technicians with clear procedures complete repairs faster and more consistently. Document critical repair steps and fault-finding so MTTR stays low even when key people are absent.
How a CMMS Tracks KPIs Automatically
Calculating MTBF, MTTR, and PM compliance by hand from paper job cards or scattered spreadsheets is slow and error-prone. A CMMS changes that. When every work order is logged — with asset, type (preventive vs reactive), start and end time, and parts used — the system can:
- Compute MTBF — Sum operating time between failures per asset and divide by failure count.
- Compute MTTR — Sum repair time per repair and divide by number of repairs; can be broken down by asset, failure type, or site.
- Report PM compliance — Compare completed vs due PMs by period, asset, or department.
- Show planned vs unplanned ratio — Classify work orders and sum hours.
- Feed OEE — Export availability (and thus MTBF/MTTR) into OEE calculations.
You get dashboards and reports that update as work is completed, so you can see trends and exceptions without rebuilding spreadsheets. For South African operations that must prove maintenance to regulators and improve reliability under pressure, that automation is a direct enabler of better MTBF, MTTR, and overall equipment effectiveness.
MTBF and MTTR are two of the most important maintenance KPIs every South African operation should track. Once you know how to calculate them and how they relate to availability and OEE, you can set targets, benchmark against your past performance, and focus improvement on PM programs, root cause analysis, spares, and training. A CMMS that records every work order and asset history makes it possible to track these metrics automatically and act on the data. Lungisa includes analytics and dashboards built for South African mining, manufacturing, and facilities — so you can see MTBF, MTTR, PM compliance, and planned work ratio in one place. If you want to move from spreadsheets to live maintenance KPIs, explore Lungisa or contact the Skynode team to see how the dashboards can support your operation.
Ibhalwe ngu
Lungisa Team