Dropbox messed up OS upgrade, caused two days of downtime

Dropbox went down Friday night after some normally routine upgrades went awry. While the company restored most functionality within three hours, problems for some users persisted until Sunday.

The outage was followed by spurious claims from hacking groups that they had successfully infiltrated Dropbox. There was no evidence to support such claims, and Dropbox quickly explained on Friday that the outage was due to an internal problem. Dropbox head of infrastructure Akhil Gupta then followed up last night with more details on what caused the downtime:

We use thousands of databases to run Dropbox. Each database has one master and two slave machines for redundancy. In addition, we perform full and incremental data backups and store them in a separate environment.

On Friday at 5:30 p.m. PT, we had a planned maintenance scheduled to upgrade the OS on some of our machines. During this process, the upgrade script checks to make sure there is no active data on the machine before installing the new OS.

A subtle bug in the script caused the command to reinstall a small number of active machines. Unfortunately, some master-slave pairs were impacted which resulted in the site going down.

User files were never at risk during the outage, the company said. The databases in question are used to provide services like photo album sharing, camera uploads, and API features.

Read 4 remaining paragraphs | Comments

via Ars Technica http://feeds.arstechnica.com/~r/arstechnica/index/~3/ejCaBXol9To/

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s