Job interviews missed, work and wedding plans disrupted, children unable to fly home with their adoptive parents -- the consequences keep proliferating in the aftermath of a database outage that crippled the US State Department's process for issuing passports, visas, and other documents related to travel to the US.
As first reported last week, the Consolidated Consular Database (CCD) used by staff around the world to process applications for travel to the US has been experiencing severe performance problems, including outages, since Saturday, July 19. The performance of the system is apparently still impaired, making it difficult for consular staff to whittle down the backlog.
The State Department has published few updates since it originally acknowledged the issue -- except on its Facebook page.
[Ready for anything? Read Cyber Attacks Happen: Build Resilient Systems.]
The most recent post there, as of this writing, states:
The Department of State Bureau of Consular Affairs continues to make progress restoring our nonimmigrant visa system to full functionality. Over the weekend, the Department of State implemented system changes aimed at optimizing performance and addressing the challenges we have faced. We are now testing our system capacity to ensure stability. Processing of immigrant visas cases, including adoptions, remains a high priority. Some Embassies and Consulates may temporarily limit or reschedule nonimmigrant visa interview appointments until more system resources become available to process these new applications. We sincerely regret the inconvenience to travelers, and are committed to resolving the problem as soon as possible.
State Department staffers also made an effort to respond to the queries of frustrated travelers trying to understand whether the system was up or down ("We are still working to restore our systems to full functionality, so while the systems are working, they are not operating at full capacity") and why it was not possible to implement some manual workaround in the absence of a functioning computer system ("We cannot 'handwrite' visas because security measures prevent consular officers from printing a visa unless it is approved through our database system. Until the system is brought back to full capacity and we are able to work through the backlog, service to our customers will be below normal.")
A statement emailed to InformationWeek reiterated these points, adding that the system was not hacked but "crashed shortly after maintence was performed. We believe the root cause of the problem was a combination of software optimization and hardware compatibility issues." The statement also reiterates that the State Department expects the system to be fully operational again "soon."
How soon is soon? "That will depend on how quickly we can bring additional resources to bear on the backlog," according to the statement. "We are increasing system capacity and efficiency, and are looking for opportunities to reduce the backlog through administrative actions."
The technical crisis didn't turn into a public relations crisis right away because this is not a web application, directly accessible to the public, but an application used by consular staff. However, because it interrupted their ability to get work done, the effects soon showed up in canceled appointments for visa interviews and paperwork that couldn't be produced when expected.
Described as "one of the largest Oracle-based data warehouses in the world" in a 2010 privacy impact assessment, the CCD at that time tracked more than 100 million visa cases and 75 million photographs, utilizing billions of rows of data, and was growing at 35,000 visa cases a day. The data warehouse works with a series of data marts, so that not every query has to go back to the central repository, but still the scale is intimidating. The system also features interfaces to other immigration and Homeland Security systems, such as the terrorist watchlist.
ScaleArc CTO and Founder Varun Singh ventured a guess -- just based on how long it is taking for the agency to recover -- that the database was corrupted perhaps by having its operations interrupted "in the middle of a major operation" such as a schema change. Something as simple as adding a column to a table, if not completed on the entire table, could leave all the data in that table scrambled.
"If you don't shut the database down right, and bring it up right, you're going to have a problem with database consistency," Singh told us.
A scalability issue, like the system crashing because it is unable to handle the volume of queries being directed at it, is less likely as the origin of the crisis -- but could be an aggravating factor now that the visa processing workload is backed up and personnel are trying to catch up, he said. Enterprise systems are less subject to database overload than public web applications, which can be overwhelmed by a rush of visitors. However, it's not unheard of: One mortgage industry customer that Singh's firm assisted with better caching of data was suffering regular system failures at the time of the month when mortgages or refinancings tended to close or following events like an interest rate change.
Enterprise systems, whether in government or industry, are rarely tested for the full range of failure and overload they could encounter and often don't boast the level of system redundancy required to restore operations quickly.
Whatever the cause, the failure had wide-ranging impacts, affecting tens of thousands of people. In many cases, there seemed to be little doubt of the traveler's right to enter or reenter the US, except that consular officials were unable to complete the final paperwork requirements and print the necessary documents.
A Colorado couple traveling in China to adopt a child told Denver's CBS 4 TV station of being unable to return home because of the logjam. Having left the US on July 9, they were unsure when they would be able to get home in the absence of a visa for their adoptive son. Travelers' forums are full of stories about people who are on the verge of missing a job interview -- or already missing more work than they had planned -- because of their inability to book travel to the US.
Another person caught by surprise was Matt Kingham, director of sales for InformationWeek Financial Services, who had expected a routine renewal of his visa would allow him to return to the US after a trip home to the UK. As an employee of our parent company, London-based UBM, he is covered by a type of visa that allows multinational companies to transfer employees between offices. He normally works out of New York City. After attending a music festival, during which time he paid little attention to news or email, he found his visa renewal interview had been canceled and had to cancel a scheduled Tuesday flight to the US. "They tell you to never book travel until you have your passport and visa in hand," he says, but for a business traveler that's not always practical.
While Kingham can do most of his work just as well from the UK, he has been as frustrated as anyone with the lack of answers about how long the delay will last. Unable to find a phone number to call, he got only a generic form letter response on email and has been relying on Facebook and travelers' forums for most of his news.
"We have no idea whether we're in line behind 10 people or 10,000 people," agrees Caitlin Crum. A native Floridian who has been living in Australia for the past two years, she is back in Naples but now separated from her husband, whom she met and married there. They had traveled together from Western Australia to Sydney, an eight-hour flight, for their visa interview, and he was supposed to follow 12 days later to start a new job in the US. "He's going to be missing his flight Saturday, which means we're out like $2,000," she says.
They had been working through the frustrating paperwork process since their January wedding, but now everything seemed to be in order. "His visa was approved, his Green Card was approved -- but they're holding his passport hostage."
IT leaders who don’t embrace public cloud concepts will find their business partners looking elsewhere for computing capabilities. Get the new Frictionless IT issue of InformationWeek Tech Digest today.David F. Carr oversees InformationWeek's coverage of government and healthcare IT. He previously led coverage of social business and education technologies and continues to contribute in those areas. He is the editor of Social Collaboration for Dummies (Wiley, Oct. 2013) and ... View Full Bio