Featured Article -
Back in the days of the Great
Depression, President Roosevelt introduced the "New Deal" under which
the Hoover Dam was built. President Eisenhower funded the interstate highway
system through the Federal-Aid Highway Act of 1956. The results of both acts are
amazing pieces of engineering and critical infrastructure in the US. But even
state-of-the-art engineering requires ongoing maintenance. Today, over
65,000 bridges across the US are in disrepair, and the failure of one of
these critical pieces of infrastructure often has catastrophic results even
when it's not precipitated by flood or earthquake.
In the case of the Hoover Dam, an
entire cement works was built on site to provide materials in a timely fashion,
so a production line of sorts was implemented to drive efficiencies into the
construction. For the interstate highway system, mile after mile of land had to
be procured and graded, and bridges and tunnels constructed. Both products
would have benefited from a Total Productive Maintenance (TPM) solution, which
would have provided a framework to build and maintain the construction
infrastructure.
Total Productive Maintenance (TPM) is
a broad-reaching methodology focused on manufacturing and maintenance
engineering processes. Holistically, the primary objective of TPM is to
increase the productivity of plant and equipment while making appropriate
(small) investments to maintain productivity and health.
Similarly our IT infrastructure needs
constant maintenance to keep it running, current and patched to prevent
nefarious entry. Regretfully, this is often a place where that appropriate
investment is lacking, leaving teams to play catchup to support systems that
have long outlived their most productive years. Updating these systems is
considered too expensive, too complex, or not considered important enough,
resulting in an "if it ain't broke don't fix it" mentality. In
some cases there are zombie systems that just never got turned off and the
tribal knowledge of why they existed in the first place has left the building.
Software development lifecycles also have similar needs to keep tools and
skills current in order to deliver quality and value to their customers.
Investment in TPM supports the
philosophy of making appropriate investments to stay current with
infrastructure and applications. Almost weekly we hear of some government
agency or Fortune 500 company being hacked or an outage impacting a broad range
of customers. Getting current and staying current is also an investment in
managing operational risk around the complex systems we manage.
TPM is a very broad subject, and this
article can only scratch the surface. I strongly encourage you to review the
"Further Reading" section, where there are links to more exhaustive
articles on this topic that are specific to manufacturing.
The many years I have spent working
in IT using project management and Six Sigma methods have shown me that there
are many direct and strong analogs between manufacturing applications and IT,
especially when dealing with operations or maintenance of legacy systems for
core business customers. Many of these concepts also apply in the case of
development of products for a consumer market where technology is the product,
to the tools and development platforms used to create and ship these products.
TPM and TQM (Total Quality
Maintenance) are core operational components of an overall quality management
system. This system is made up of Eight Pillars that we will explore in this
article.
Fundamentally,
these eight pillars are the foundation of proactive planning and preventive
maintenance to provide a baseline of stability, capability and performance for
manufacturing processes.
Five S Foundation
At the
foundation of the Pillars of TPM there are the Five S's:
- Sort -- Sort out and determine what is needed in the manufacturing
area.
- Straighten -- Place items in a logical arrangement so they are easy to find and
ready to use. Clearly note where they are to be stored when not in use so
they can be returned.
- Shine -- Make sure that the workplace is clean and the equipment is in
good working order to perform the task.
- Standardize -- Make sure that the first three S's are practiced frequently to
remove any special cause variance in the manufacturing process.
- Sustain -- Maintain the rules and standards with a focus on continuous
improvement.
Below are
some analogs to IT. (Nothing here should be surprising for any mature IT
organization.)
- Sort -- Determine what is important to the customer and what tools are
needed to develop and maintain functional value.
- Straighten -- Make sure that the development or maintenance environment is
sustainable and there are controls around how product capability and
support is delivered/provided to the customer.
- Shine -- Manage defects and possible health issues for the application
(reliability, performance, data quality and capability).
- Standardize -- Use sound software engineering practices, including naming
conventions, development tools, test harnesses, and error handling within
the applications.
- Sustain -- Manage the error log of defects during and after deployment as
input into an enhancement list for the next release.
The core
tenet behind Focused Improvement is to maximize the overall effectiveness of
equipment, systems and processes by elimination of losses and continuous
improvements in performance. An analog in IT can be how we manage technology in
a data center through load balancing or identification of critical failure
points in our business applications such as interfaces and feeds. Another
example could be making applications more fault-tolerant through error handling
or algorithms which deal with potential show-stopping exceptions.
Within
this pillar there are six zero breakdown measures (using an IT analog):
- Establish basic equipment conditions -- What are the Standard Operating conditions
for usage of hardware or software?
- Comply with conditions of use -- Set up service level agreements (SLAs).
- Restore Deterioration -- Analyze breakdowns or failure points and
restore to working order within SLAs and with operational norms.
- Abolish environments causing accelerated
deterioration --
Identify special cause issues that are causing chronic breakdowns or
outages. How often has an outage happened, was it fixed with a work around
or fixed to prevent it happening again? Was a true root cause established
and validated?
- Correct design weaknesses -- Identify and correct known exceptions. How
much of the design can be pushed back into the business process vs. fixing
it through complex algorithms that may be difficult to maintain in the
future? Does the design post-implementation work?
- Improve operating skills -- Ensure that the actors in the process or
application users know how to work within the conditions of use. Do they
receive appropriate training on new features, not only for existing
actors, but also new actors to the process?
Autonomous Maintenance
This
pillar primarily focuses on routine maintenance of the environment such as
lubrication and cleaning of equipment by the operators rather than more
in-depth maintenance performed by dedicated staff. In IT, an example could be
users reporting error messages, or taking corrective action when data may cause
an abnormal end to a batch program. Another example could be involvement of the
Product Owner overseeing the implementation of a new feature and tracking its
first few uses to provide feedback to the development team. This can also be
looked upon as a part of preventive or predictive maintenance; both concepts
support the first pillar of Focused Improvement.
Planned Maintenance
This
pillar is obvious by its name. Most equipment and many business applications
require some down time for maintenance to be performed. It could take the shape
of an enhancement/version upgrade, replacement of a power supply that is
creating error messages on a server, or upgrade of firmware on a network
appliance to patch a security hole, a mainframe IPL during a maintenance window
to make needed updates to an environment, or even a time change.
This
maintenance is typically carried out by skilled or trained professionals who
will perform such maintenance during a planned outage and restore the environment
back to prior or improved performance levels. This concept is also to increase
meantime between failures, but can sometimes lead to failures if not performed
correctly or introduce an unforeseen defect.
Quality Maintenance
Quality
maintenance deals with a concept I covered in a prior article, Poka-Yoke, by
targeting quality issues with products and systems in the pursuit of reducing
future defects. It focuses on the concept of fixing the problem before it
becomes more expensive to fix later. The team looks for failure points using
Failure Mode and Effect Analysis (FMEA) -- "what can go wrong and what
would happen if it did?" -- to determine what preventive maintenance
should be performed before an event happens. A significant tool in Quality
Maintenance is inspection to seek out potential failure candidates. An example
could be looking at error logs for warnings or errors that have occurred,
followed by root cause analysis to determine the failure point with an
implemented remediation path.
Cost Deployment
Cost
deployment is a component of World Class Manufacturing (WCM). One example of
its usage is by the Fiat Group Automobile Production System (FAPS). Fiat uses a
financial model to reduce waste and optimize efficiencies in the manufacturing
process. It holistically looks across the Eight Pillars from a financial
perspective. Some core components of Cost Deployment include:
- Baseline total cost of Processing (Ownership)
- Identification of losses or waste -- use a
matrix to identify the sub processes they occur within in order to
identify elimination methods within the sub process
- Identification of the relationship of the type
of loss or waste and qualitative T-shirt sizing of the loss or waste cost
- Transformation of the waste qualitative
measures into quantitative costs (Capitalized and Operational components)
- Identify which TPM pillar they belong to,
which pillar will be able to control the elimination of that waste or cost
- Creation of a portfolio of projects to address
the highest waste/cost candidates. Pareto and Payoff matrix tools can be
used effectively with prioritization of the remediation projects.
- Implement continuous improvement to prevent
the costs and waste from creeping back into the process.
Fundamentally,
Cost Deployment is a tool to identify the return on the investment in TPM
improvement projects and is quantified by a retrospective ROI and use of
Control Charts to monitor defects post improvement. Cost Deployment can consist
of soft components when only a T-shirt size can be calculated and hard cost
components when they can be tied to actual savings or benefits. An IT example
could be in call center or big data processing, where even small efficiency
gains can have significant payoffs when scaled up.
Early Equipment Management
Building
on the earlier pillars, the focus of Early Equipment Management (EEM) is
reducing development lead time for new product development by taking best
practices from equipment, tool and engineering designs already in use. Early
Management also incorporates Early Product Management, where the emphasis is on
the product design and delivery approach rather than the equipment or processes
used to deliver the product. Overall, the focus of EEM is to address potential
failure modes that could be exposed by a new vertical process or product. Think
of it as proactive risk identification and response from a product and
equipment standpoint.
An IT
analog could be a development team taking a proactive approach by taking the
lessons learned from a prior release retrospective. The next step would be to
conduct a look-forward by analyzing the potential for risks and issues to
reoccur in the next development lifecycle. The team would then implement a plan
to address those risks or failure modes based upon their likelihood of
occurrence and impact with a continuous improvement goal in mind. As with all
continuous improvement methods, each risk should be given an owner and their
implementation strategies should follow guidelines under the team working
agreement. These activities would be on top of a more traditional project risk
management lifecycle.
A more
specific example could be if a development team had difficulties with its bug
glide towards the end of their last release, which caused them to miss their
ship date. They may take steps to analyze the types of defects that caused the
slippage and tighten up the engineering disciplines to address those specific
root causes, such as staffing, skillsets, code walkthroughs and practices, test
case coverage, enhanced specificity around functional and technical
requirements, etc.
Training and Education
The focus
of this pillar is to reinforce the knowledge of the actors in the manufacturing
processes, such as machine operators and maintenance personnel, to use the best
practices of the TPM pillars across their roles in support of a TPM holistic
environment. This approach also encourages management to provide coaching and
mentoring of team resources as well as facilitating the drive towards ongoing
maturity of the manufacturing processes.
From an IT
perspective this pillar could consist of a training plan or access to education
in new industry tools and trends, for example building skillsets around cloud
computing or hardening strategic business applications against cyber threats.
Another example of this pillar is being a "student of the business,"
where the IT development team learns the business processes and how better to
instantiate the functional requirements into the business application.
Safety Health Environment
This pillar
may not have a clear analog when working with software development, but the
analog is clearer when dealing with hardware in a datacenter or data closet
where electrical and other hazards exist. Another example could be in highly
regulated medical equipment or avionics software development, where software
defects could create life-or-death situations. When working for a Company that
delivers global chemistry solutions, health, safety and environment are of
paramount importance and should always be a laser focus of everyone in the
company. Group meetings in these organizations often start with a "safety
share." Many of the safety shares are related to everyday life situations.
In Summary
Whether
you are building a bridge or a dam or maintaining legacy code on a production
system, there are elements of TPM that can provide benefit to the integrated
development environment your product is built and maintained within.
Getting
current and staying current with tools, training/knowledge and infrastructure,
and understanding overall costs to maintain a safe and productive environment
are just as necessary in IT as they are in the manufacturing arena. TPM,
through its eight pillars and the 5S foundation, covers a broad range of
topics, many of which may seem to be just common sense on the surface. The
strength of TPM is creating a framework where those concepts are formalized
into an executable structure. ITIL performs a similar role in the IT industry.
TPM is yet another example of a holistic methodology, which has numerous
analogs in the IT industry.
|