You Will Not Come Home From Space Today
Since the 10th anniversary of the Space Shuttle Columbia accident is coming up, and I’ve been reading the accident report, it seemed fitting to write up a discussion on the subject. I’ve learned a rather large number of things about the disaster that I hadn’t previously realized. So, we’ll do a dissection backwards — we’ll consider reality first, and then how fiction tends to get disasters wrong.
No spoiler alerts here — we all know how this one ended — but I will put up a warning for depressing tragedy. Don’t read this if you want something to cheer you up.
[Note: The title here references Up Goer Five.]
February 1st, 2003. Saturday. I don’t remember exactly what was going on that day, but it was only my mother and me at home. I think my dad and brothers were all out getting their hair cut. It was a nice sunny day, but I was inside, flipping through the channels for some reason. Maybe I was just putting off homework. Regardless, the image I found playing on the TV looked like this:
Let’s back up a bit, and address how this all started.
The Space Shuttle program was originally conceived as providing a medium-sized vehicle for getting to and from a series of low-Earth orbit space stations, as the initial post-Apollo plan. The shuttle was to be reusable, to save on cost. The space stations obviously didn’t happen, and NASA only got funding for the shuttle. Which meant they had to make a reason for the vehicle. To economically justify the shuttle, it ended up needing to be all things to all people — reusable, cheap, fast turn-around, cargo, crew. It had to be able to launch satellites and do experiments about weightlessness in orbit. Unfortunately, this combination resulted in a “jack of all trades, master of none” consequence. Safety ended up being stacked against budget and schedule pressure, partly because NASA lacked a truly independent safety organization.
The pressure to stick to schedule with satellite launches earlier in the program strongly contributed to the Challenger accident in 1986. Concerns voiced about the O-rings, untested in cold weather and showing strain in previous launches, were ignored.
There was some reorganization thereafter, including the cessation of using the Shuttle to launch commercial satellites and a reduced launch rate. However, in the 1990s NASA again suffered budget cuts, massive reorganization mandated by Congress and new leadership to attempt to “privatize” the space program. Further schedule strain due to being the only vehicle capable of launching some of the larger International Space Station (ISS) modules.
Safety was, again, not truly independent of the people pressured to stick to schedule. In this case, STS-107, Columbia’s mission, had been repeatedly delayed. While this was a science mission, studying things like biology in zero-g and observing high-atmosphere lightning, further delays were very undesirable because they would affect the schedule for the next shuttle launch. The next mission would deliver a major component to the ISS, and everyone was worried about getting it up by a certain date in February.
Meanwhile, the issues with foam on previous mission where gradually being considered less and less of a problem, despite being far from design specifications… to the point that, between miscommunications and problematic culture and pressure to stick to schedule, foam loss was judged to be not a “safety-of-flight” issue. Apparently, this kind of reaction is common enough to have a name — “go fever.”
Foam loss was the proximate cause of the accident. When the Shuttle launched, it had three main components. There was the orbiter itself, which was connected to the External Tank. The External Tank contained liquid hydrogen and oxygen in separate compartments, which were additional fuel for the launch. The first rockets lit, however, were the two solid-state boosters. These were attached one on each side of the ET. (See the picture for what this looks like.) The boosters and the orbiter were reusable; the ET was not, and a new ET was built for each launch.
Because the ET contained cryogenic materials, it was covered in foam on the outside. This prevented ice from forming, which could otherwise have fallen off during launch and posed a danger to the orbiter. The potential danger was that the falling ice could damage the tiles on the underside of the orbiter. These tiles were the heat shield for the orbiter on re-entry, protecting the wings and the contents and crew of the orbiter. The parts that got hottest were the leading edges of the wing. These had a different kind of tiles, called reinforced carbon-carbon (RCC) tiles. According to the specs, the tiles were not designed to withstand any impacts. There was concern about micrometeorite damage, but that generally was not a significant problem. Tiles that experienced some damage would be repaired or replaced after re-entry.
Unfortunately, while ice generally wasn’t a problem, the foam itself was. The foam had a history of shedding from the ET, despite design specifications that said it shouldn’t. (Examination after the Columbia disaster indicates that the shedding was due to thermal contraction of the foam after the ET was filled with cryogens, leading to cracking.) Earlier in the Shuttle program, this was considered a serious concern. However, over time, foam shedding and strikes against the orbiter’s tiles or other parts of the launch system were normalized. Two flights before Columbia’s last, a particularly large chunk of foam fell off the connector between the orbiter and the ET. This chunk of foam caused significant damage to one of the solid rocket boosters. Despite this, the orbiter launched okay, and re-entered okay. After this incident, foam strikes were downgraded from something closely checked after each flight to being not a significant issue — a “known risk” which was, nonetheless, far outside of design specifications.
For STS-107, the final flight of Columbia, another large chunk of foam fell from the same area, but this time, struck the underside of the orbiter’s left wing. Due to bad camera placement and focus, it wasn’t clear from launch video how bad the strike was. Engineers were concerned that it could damage the tiles, enough to permit excess heating on re-entry. They were especially worried that the excess heat might cause damage to the left landing gear and wheel.
A committee was formed to look into possible damage, while Columbia orbited the Earth. They made three requests for on-orbit imaging to examine the wing. All three requests went through unofficial or otherwise non-standard channels, and all were eventually refused. The refusal was due partly to confusion about how serious the issue was. The risk was not well communicated to higher management, who inferred that the foam strike wasn’t a problem. The foam impact committee interpreted the refusal as absolute, particularly given that they were asked to demonstrate that the issue was serious before they could get imaging, and they needed imaging to show the issue was serious…
They also did some testing with a computer program called Crater, which simulated impacts on the tiles. It was a “conservative tool”, which meant that it would over-predict the amount of damage to the tiles. They did the best they could, estimating the size, speed and location of the foam piece at impact. The simulation wasn’t calibrated for such large pieces of foam (it was a factor of several hundred times larger than anything that had been tested), but the predicted penetration depth was greater than the thickness of the tile. This result was watered down — Crater overpredicts depth, and given differences between the real and simulated tiles, the engineers thought probably wouldn’t actually penetrate the tile. As a consequence, the message that got out was “this is going to be okay, not an issue.”
Later testing by the Columbia Accident Investigation Board actually shot a piece of foam at actual RCC tiles. The foam not only penetrated, but punched a hole more than a foot across. This explained the patterns of damage found in the debris recovered later and the in-flight telemetry.
One other item that was overlooked during the mission was an object tracked by the US Air Force’s space situational awareness units. They keep track of essentially everything the size of a golf ball or larger in near-Earth orbit, since space junk can pose a hazard to satellites. A piece of something was tracked moving away from the orbiter, entering the atmosphere and burning up a couple of days after the launch. In hindsight, this is likely be a piece of tile that broke off at impact, got stuck inside the wing for the rest of the launch, and later floated out and away.
Columbia’s mission in orbit was just over two weeks. Had on-orbit imaging been taken, and the serious damage actually been observed, there were a couple of options.
The riskier option would have been to attempt an emergency EVA, and attempt to repair the tile while in orbit. This would be done by placing thick metal material scavenged from the orbiter in the hole, and holding it in place with a bag of water allowed to expand and freeze in the cold of space.
The other option was to speed up the prep of space shuttle Atlantis for its own launch. Rather than sending up the next piece of the ISS, it would go up with a minimal crew and no cargo, and pick up the Columbia crew. They would then go back down to Earth in Atlantis. Of course, this had the same risk from when foam-meets-tile, but had a higher chance of success than the emergency repair if the launch of Atlantis could be done before Columbia ran out of consumable supplies for life support. In this case, Columbia could be left in orbit for possible future repairs.
But by the time Columbia was entering the atmosphere, it was too late.
February 1st was the date for re-entry. For any Space Shuttle, the orbiter is initially orbiting tail-first, then does a burn to go into a reentry trajectory. Then it turns around, with the nose and all the heat-resistant parts facing forward.
In the case of Columbia, this all went fine. Serious heating from reentry starting as it was over California. It then flew east to where it would have landed in Florida. As it went, sensors in the left wing started behaving strangely. Heat sensors went to “off-scale”, pressure sensors failed. Not all the telemetry made it to ground control, but they started wondering. Meanwhile, not sent to the ground was the fact that the orbiter was making large automatic corrections for extra drag from the left wing. People on the ground started to see pieces coming off of the orbiter. Ground control wasn’t watching the orbiter with cameras, so they didn’t know about the debris.
The superheated air had penetrated the wing and started tear it apart.
Communications between ground and the orbiter cut off, but this was a typical event due to interference when the orbiter was going through the hottest part of reentry. Communications never resumed. Meanwhile, hot air thoroughly penetrated the left wing. More pieces flew off the orbiter, and eventually the control systems ceased to be able to compensate for the extra drag. The orbiter spun out of control and disintegrated over Texas and Louisiana.
No one in the control room knew anything more serious than a communications malfunction had happened until a friend of one of the operators called.
Issues in Fiction
Of course, complicated issues like these are often overlooked in fictional media. It’s always a giant robot made by one mad scientist going horribly wrong from the start, or the ethically compromised cybernetics guy experimenting on unwilling humans, or a CEO making the big, stupid decision, loudly overriding the warnings of the heroic representative engineer.
When considering examples, Michael suggested “Every Star Trek episode ever.” This… is pretty much true. Red shirts die all over the place due to poor procedure, problems with the ship are always simple, and the bridge is made of explodium. And the warp drive is made out of explodium, with far too few safeguards. A lot of other shows — The Death Star of Star Wars may or may not be a good example — after all, having a small exhaust port lead to the explodium that powers is just the kind of unexpected connection between systems that could crop up and cause trouble. Though, if the Empire’s engineers didn’t spot the connection, it’s surprising that the rebels did… then again, maybe the engineers were still trying to convince other folks that it was a potential problem.
Real life is not so simple as just plugging in few new components. It’s not just one guy, doing one thing very wrong. For Columbia, it was a series of individually smaller problems — the falling foam, the degree of resistance of the tiles, and the various systemic cultural and communications failures that prevented the issue from being corrected.
And on the technical rather than social side, it’s still not just one thing going wrong. Especially for well-constructed systems, there generally must be several things that go wrong in sequence for disaster to strike. For the shuttle, the falling foam wasn’t a problem unless a big enough piece hit a critical spot — which took until the 113th mission to happen. The Chernobyl disaster is another example of this. It required an older, less-safe design, being operated at night by inexperienced technicians doing a stupid test with most of the safety systems deliberately turned off and combined with a generator failure.
Shared blame is problematic. We want one monster to be at fault, so the hero can just deal with that guy. We don’t want to consider the larger cultural problems (as at NASA), or acknowledge that all the people involved were doing their best, or that everyone made mistakes. Or that a decent, sane person might contribute to those fatal mistakes.