…Due to Maintenance

by Melissa Palacios | February 27, 2012

Dan Ewart has been working at the College since 1973. He has helped develop increasingly complex computer systems for the campus. "[IT system] will be unavailable due to maintenance" is the message usually posted when W&M Information Technology plans an interruption in some kind of service. Whether the message is referring to Banner, Blackboard, myWM, etc. it is rarely a welcomed event. But it happens - and fairly frequently. So what's all this maintenance about anyway?

Maintenance, what?

A good person to ask is Unix Engineer, Dan Ewart. He routinely performs maintenance on Unix-run systems like Banner. Ewart has been working for the College since 1973, after graduating from W&M in 1971 with a degree in chemistry. He worked on the College's first mainframe computer system and has helped the campus develop increasingly complex computer systems since that time.

Ewart explains that there are essentially four types of maintenance:

Database application maintenance. This is usually an upgrade in code, in which new coding scripts are applied to the database.
Operating system maintenance. A procedure that is executed on the computer's operating system.
Hardware maintenance. Where some physical piece of equipment is being worked on and/or replaced.
Patches and minor updates. Usually these updates just keep the system up-and-running or they are efforts to avoid vulnerabilities. Changes are usually only noted on the back-end of the system where it is applied.

There are also daily and/or weekly system refreshes that just keep everything running properly. It's kind of like rebooting your computer. Most of these refreshes are done in the wee hours of the morning for systems that need daily refreshes and on the weekends for those that need weekly refreshes. Ewart routinely comes in the office at 5:00 am on Mondays to make sure all the Banner systems are running properly after the weekend refreshes.

The Tip of the Iceberg

Maintenance is not as straightforward as it seems. There is a rigorous development process that happens long before an announcement is ever made. Take, for example, a security patch that needs to be applied to a database system. First, the patch is put through a test database (called TEST) where initial testing is performed. If all goes well, it is then run through a pre-production (abbreviated as PPRD or Pre-PROD) database. This is the real proving ground as the system, with patch applied, gets poked and prodded by IT engineers and programmers.

When all the wrinkles have been smoothed-out, the patch, still living in the pre-production database, is tested and vetted through selected groups around campus. For instance, if the patch affects Banner, it will be vetted through departments that are highly dependent on Banner, like the Registrar's office. This testing and vetting process can take anywhere from a couple weeks to a couple months to complete.

Once everything is ready to go, the final step is to put the patch into the production (aka PROD) database to make it "live".

But not so fast!

Putting things into production systems is always high-stakes. There is always a chance that things may not go as planned (despite the previous testing) and ALL users could be affected. Furthermore, the system is going to have to come off-line for a period of time for the installation. As systems become more abundant and interdependent, it gets increasingly hard to perform maintenance without bringing down a technology eco-system. IT tries as best as it can to, as Ewart says, "minimize impact." This means, in addition to rigorous testing, careful planning is also a must.

Minimizing Impact

So when is the best time to take a system (or technology eco-system) off-line? Many people would respond with "never", especially when it comes to heavy-hitting systems like Banner and Blackboard. Unfortunately, that's not a viable option.

So IT tries to conduct maintenance at times with the least amount of traffic. For a system like Banner Admin, which is heavily used during the weekdays, maintenance is usually conducted in a window between the hours of 4am and 8am (before the workday starts). If the maintenance is expected to take over a couple of hours, it is scheduled for a weekend instead. Heavy-use time periods like registration, Add/Drop, time sheet deadlines, and finance-reporting periods are avoided all together.

IT also consults with departments about the proposed dates for an outage. Based on the needs of the departments, a date is chosen. Inevitably, somebody or some group will be inconvenienced by the chosen time, but IT tries to weigh the needs of the departments and do the best they can to minimize the impact.

It is at this point you will finally see the infamous message saying "[IT system] will be unavailable due to maintenance," usually with a at least a week of advanced notice.

Available Because of Maintenance

Trying to keep the campus systems up and running 24/7 is quite an endeavor. Systems will not function without constant maintenance and upkeep. Through rigorous testing and careful planning, IT is doing the best they can to minimize the impact of maintenance and system outages.

Although announcements are only made when systems are expected to be down, you are welcome to put your own positive spin on the announcement when the systems are up. Our suggestion? "[IT system] is available because of maintenance. Yay!"