Sunday, September 10, 2006

The Role of MMS in the American Express 9/11 Disaster Recovery


It’s been five years now and I can look back over that nightmarish month that provided both a career high and a personal low. During America’s most horrific hours I experienced some of the most influential moments of my career and the ending of my first marriage. I’m talking about how 9/11 affected my then employer – American Express, my personal life, and the lives of those around me.

While at American Express I worked for Global LAN Systems Engineering (GLSE) – a team more or less dedicated to defining the standards for Windows based non-web facing technologies and infrastructure. I and my partner Mark Vinsen were responsible for architecting the new Active Directory infrastructure, writing all of the standards, building the core infrastructure, and testing the new features. By September of 2001 we had succeeded in doing a lot of testing, and a lot of documentation, but very little in the way of deploying anything other than our new empty root. American Express, being the large multi-national financial institution it is, is very slow to adopt new IT infrastructures; after all, IT is not its core competency and whether or not you’re running eDirectory or Active Directory doesn’t impact the way card members utilize “The Card”.

Early on, Mark and I had decided to split the Architecture so that we could each focus on specific related technologies. Mark took all things AD Services, which was principally Group Policy and OU Design. I took all things Infrastructure Services related which kept me where I was happy – DNS/WINS, DFS, Forest/Domain Design, and what would become Identity Management. You see, I had caught wind of another team within Amex who was working on deploying Microsoft’s Metadirectory Services (MMS) and integrating with HR and the principle service directories. I recognized early on that allying with this “Metadirectory” team would seriously influence the design of the Active Directory and offer some serious TCO down the line; little did I know that it would prove to be a key success story for disaster recovery as well.

On the morning of September 11th, 2001 I was driving in to Scottsdale to attend a Cisco CCNA training course (part of my self inflicted IT diversity) when I heard the announcement. Arizona being three hours behind we had already missed most of the early story so we were trying to catch up to the rest of the country who had already been staring at the smoldering buildings for some time now. I never did finish that training class – I returned to work in order to help plan the Disaster Recovery efforts; ~6000 American Express employees working in the World Financial Center were now displaced.

The next few weeks were a blur – I remember there essentially being two options before us, attempt to restore what had been there (an aging NT4/Win9x and Novell 3.x environment) or move forward with Active Directory. I remember being asked if we could do it, and a rare and risky chance was taken to move forward – all eyes were on us to deliver something short of a miracle in a short span of time.

The solution consisted of a high powered Citrix farm, supported by pristine new Windows 2000 File/Print clusters and our newly built Active Directory domain with Thin Clients being installed at the remote locations being provisioned. Jason Willey (GLSE Architect) managed to design a pretty kick-ass Citrix environment and he and Glenn Haggard (GLSE Architect) drove a top notch team from Compaq to deploy the servers and build the standard desktops that the employees would utilize. Mark, myself and a whole cast of GLSE Architects and members from other teams pitched in to get all of the infrastructure components into place (Network, SAN storage, OS Builds, and the day to day support). I can remember walking around in my own shadow world, consumed by architectural diagrams, white board drawings, and my Nomad Jukebox to help tune out the world so I could focus and make it through the 20 hour days. I remember being stopped in the hall or being called into a meeting by some executive manager four or five levels higher than I to make some split second decision on some bit of the implementation – all the while trusting in our team to make this work, to restore operations, and to restore shareholder faith that the company would continue to operate after loosing its world headquarters. I also remember the way my body decided to deal with the sudden onset of all of this stress. I remember throwing up much of what I ate, and as a result, not eating much. When I managed to grab a few hours sleep I was plagued with nightmares about what had to be done – I was possessed by an overwhelming need to complete the task that we had asked for (and a nagging wife to stay at home)…begged for even, the chance to do our little insignificant part to help those affected. We knew we couldn’t be there, we couldn’t make a difference like the true heroes did, but we could do everything in our power to affect the things around us.

Despite that looming horror and uncertainty, I have to pause for a moment and tell you about our little Road Warrior crew. Three of the members of our team volunteered to drive a rental van to New Jersey so we could deliver some critical server and network components to some of the locations and give a hand trying to get local operations back on track. George Middendorf, Michael McGibbney, and Steve Richins made the trek and saw Ground Zero first hand. They saw the shock on the faces of our co-workers who were in the tower when the first plane hit, and who were all working diligently to restore service, and to take their minds off of what had happened. Remembering those guys driving all that way to lend their support always brings a smile to my face. I like to think that having their personalities and expertise there onsite after such an unthinkable tragedy helped to lift spirits.

So, MMS – yes, our little unsung hero, and the folks on the Metadirectory Services team (Attila Erdos and Neville Lee) had managed to secure access to all of the HR data we would need but we hadn’t yet had time to work on the ADMA. So, I asked Attila to provide us an LDIF export containing all of the New York and New Jersey employees provisioned into some location specific OUs. In a few minutes I was able to load all 6,000 employees into Active Directory (LDIFDE) – MMS had done the really hard work in defining who was affected, had clean data and pre-built sAMAccountNames all ready for me to work on. Once we had all of the accounts loaded, it was up to Mike Kasher (Lotus Notes Consultant) and me to prepare the accounts for their entitlements: home directories, terminal server profiles, and Notes ID files. In what would have taken me a few hours to configure with the new Notes and AD MAs it took Mike and I a marathon 40 – 60 hours (2+ days) to script and execute. Finally, all of the pieces started to fall into place. New facilities had been procured and were being appointed. The network carriers were busy running nice big pipes to the new facilities. Compaq was drop shipping a ton of new Thin Clients, and arrangements were being made to ferry all of the displaced employees via train and bus to the new locations. In the span of about three weeks we and the other teams had accomplished the seemingly impossible – we had built a completely new infrastructure from the ground up. Everything from buildings, power, network, furniture and chairs had been procured and installed. Everything back in Phoenix had come together and was operating; a completely new Citrix and Active Directory environment with over 6000 ready to use accounts were ready to access new virtual desktops secured by GPO, to send and receive email, and to use their basic Office applications. Another team was already busy restoring the terabytes of backed up data from tape and still another team was well ahead into prioritizing and restoring all of the localized business critical applications. MMS had played a critical role in restoring service and is one of the many un-sung heroes of American Express’s 9/11 Disaster Recovery.

In the aftermath we had a lot to consider – how had we been able to accomplish so much in the face of tragedy and adversity when we normally spent months or even years attempting to implement projects of this magnitude? To this day I joke that it took terrorists to crash a plane into a building before we got the go ahead to implement AD, but it’s not far from the truth. What we had experienced was a moment of horrific clarity, where everyone for a brief moment in time was focused on a single task; a feeling an entire nation shared post September 11th. No one stopped to question why, to throw procedure in our faces to slow us down, or sabotage us for their own personal or political gains – we just did had to be done, we did was right and it worked.

In my own personal aftermath, after experiencing such a career high I experienced the ending of my first marriage. Instead of having the understanding and supportive wife I have now who would have been there for me done whatever it took ease the burden, I had quite the opposite. The terrorists are responsible for one good thing at least – they ended what was already a failing marriage and made it terribly clear that I could no longer live the way I had been.

So, after my cathartic spiel, I pause to remember those who were affected by this senseless tragedy. I take many lessons away from this experience, not the least of which is an appreciation for teamwork, both professionally and intra-marital as well as my newfound love of Identity Management technologies and how they serve to make our lives just a little bit easier.

4 comments:

Anonymous said...

Brad,

This is a great post and you have a lot of the same feelings that I had and witnessed. I joke at my current company all of the time. The best way to move foward is fly a plane into our data center or use a huge magnet in our data center, so we can start fresh and move on much quicker. People are not driving motor T's to work anymore and we should not move data with them either. Great Blog.

George Middenddorf

Brad Turner said...

Thanks George,

I think we've witnessed two events such as this in our lifetime - the first of which was not quite as disastrous as 9/11 and that was Y2k.

With Y2k, all of the paranoia and work was done before the diaster happened and was the only time I am aware of that so many companies and governments worked proactively to update their crap and refresh hardware.

Why does it take major motivating factors like the threat of impending doom to force people to move forward technology wise?

Jason Willey said...

You know...despite the rough 90 hour weeks, and having to sleep under my desk on more than one occasion - I'm really glad that I got to contribute something during that time. I think it was our turn to do great things, things that we may have even though were impossible. Amex really bet alot on all of us, and I have to give credit to the upper mgmt types that knew it was time to trust their own people, and remove the obstacles that were preventing us from succeeding. It's really too bad that the things we (and many others at other companies I am sure) accomplished didn't revolutionalize the IT industry. All the silly things like rejecting changes for typos and empire building stopped and the results were amazing. We proved that if you have the right people in the right roles and you trust them - you can truly accomplish great things.

You know who really impressed me during that whole effort - Thomas Snawder from Microsoft. I think Thomas really brought meaning to the term partner, as I remember him being available to do anything thing to help even it was not in line with his job description as a technical account manager. I remember him making food runs for us, bringing us the caffeine that we all so desperately needed to keep running on 5 hours sleep, and I remember him going to the store and buying the CDROM media that we needed to deploy the Citrix servers. He would stay in the office with us til all hours of the night, just waiting to see if anyone needed help.

I often think about all the things that were possible at that time, and wish we could have the same progress without the disaster.

On a more personal note - I don't think I will ever forget that trip we took to New York in October 2001, and seeing the site still burning from across the water. I haven't seen anything so tragic before that day, or since (thank goodness).

Brad Turner said...

You know, I had forgotten the role Thomas played - wow. I have never seen a Microsoft TAM ever come close to his dedication.

Regarding the trip in October, yes - that made the whole thing worth it and really drove home the impact of the whole ordeal.

Post a Comment