Posts Tagged ‘Amazon Web Services’

The Skynet is upon us and it has chosen Amazon as its prime target. (For non-Terminator fans, Skynet is the main antagonist in the Terminator franchise — an artificially intelligent system which became self-aware and revolted against its creators. April 21st was the date when Skynet began its attack on the human race.) Proving to be true this time, Skynet struck in the guise of massive outages at Amazon Web Services‘ N. Virginia Data Center on April 21st, triggering immediate concerns over security of data on the cloud. Within hours, large number of people were already quoting this incident as an example for why you shouldn’t go for a cloud. Hundreds of popular websites including Foursquare, Change.org, fotopedia and Wattpad went down because of multiple failures at Amazon’s N. Virginia Data Center. The services were fully restored only on the 24th of April, making it the longest and most infamous disasters in the short history of cloud computing.

For starters, Amazon’s image was completely aback after this incident. To make matters worse, Amazon refrained from making any statements about the incident even when the whole world was trying to understand what exactly happened.

While, there is no official word yet, according to certain speculations it could be the Chinese hackers (and probably state sponsored) who brought the servers down. In a letter sent to its members, Change.org announced that the site has been a target to cyber attacks coming from China, probably ordered by the Chinese government. Amazon Web Services hosted Change.org now reported,    “Change.org is currently experiencing intermittent downtime due to a denial of service attack from China on our web site. It appears the attack is in response to a Change.org petition signed by nearly 100,000 people worldwide, who are standing against the detention of Chinese artist and activist Ai WeiWei. Despite this attack on our members and our platform, we will continue to stand with the supporters of Ai Weiwei to defend free speech and the freedom to organize for people everywhere.”

There is thus a high likely hood that an attack trying to pull down change.org actually brought down the entire N. Virginia data center of Amazon.

Now, before you decide against Cloud as an option to store data, consider this:

While Amazon was constantly trying to update customers on how they can remirror their data or different locations (thought the tips didn’t work in many cases), most customers had more or less recovered from the downtime in 2 days time. Though even two days sound quite a lot, the question is, how many small enterprises have the ability to sustain an attack of this magnitude, which was possibly a state sponsored attack. Those who are criticizing cloud for this disaster, would they have been able to recover in 2-3 days time had this attack happened directly on their infrastructure instead of Amazon Cloud? Was it possible for change.org to partially recover from the attack within hours of impact? If you do some analysis, you’ll get your answer.

How many organisations are ISO 27001 certified? How many organisations have been certified with multiple third party auditors on innumerous parameters? If you analyse all this, you’ll feel that hundreds of organisations that were impacted with the attack didn’t really make a poor decision to go with the cloud. In fact, they were the wisest of the lot. The only flip side of the equation is that they were impacted due to collateral damage as the attacks weren’t targeted at them!

The problem doesn’t lie with the cloud but with how you manage it. Here is a example on how data needs to be managed over the cloud and how you can prevent such disasters from impacting you.

It has been close to 33 hours when the outages at the Amazon’s N. Virginia data center began. The incident has brought down large number of websites that were hosted on Amazon Web Services (AWS). Though the incident has affected only the N. Virginia data center of the company, there are large number of customers affected, which Amazon cannot or rather should not ignore. However, there has been no official statement, blog post, tweet etc from Amazon so far. The last press release from Amazon was on the 19th of April, Announcing Live Streaming for Amazon CloudFront.

Amazon gives a 99.9% uptime guarantee to its customers, which is roughly about 8.76 hours in a year. With over 33 hours of outage, Amazon has already exceeded this by almost four times. Yet, Amazon doesn’t see any obligation in informing its customers about the reason of the outages. Logs from AWS’ Service Health Dashboard are showing newer services going down. Through this platform, Amazon is trying to ensure users they are working hard to get things sorted, but 33 hours of outage isn’t something anyone can expect one of the biggest cloud provider in the world.

Here is a current log of events:

Amazon Elastic Compute Cloud (N. Virginia)

10:58 PM PDT Just a short note to let you know that the team continues to be all-hands on deck trying to add capacity to the affected Availability Zone to re-mirror stuck volumes. It’s taking us longer than we anticipated to add capacity to this fleet. When we have an updated ETA or meaningful new update, we will make sure to post it here. But, we can assure you that the team is working this hard and will do so as long as it takes to get this resolved.
2:41 AM PDT We continue to make progress in restoring volumes but don’t yet have an estimated time of recovery for the remainder of the affected volumes. We will continue to update this status and provide a time frame when available.
6:18 AM PDT We’re starting to see more meaningful progress in restoring volumes (many have been restored in the last few hours) and expect this progress to continue over the next few hours. We expect that well reach a point where a minority of these stuck volumes will need to be restored with a more time consuming process, using backups made to S3 yesterday (these will have longer recovery times for the affected volumes). When we get to that point, we’ll let folks know. As volumes are restored, they become available to running instances, however they will not be able to be detached until we enable the API commands in the affected Availability Zone.
8:49 AM PDT We continue to see progress in recovering volumes, and have heard many additional customers confirm that they’re recovering. Our current estimate is that the majority of volumes will be recovered over the next 5 to 6 hours. As we mentioned in our last post, a smaller number of volumes will require a more time consuming process to recover, and we anticipate that those will take longer to recover. We will continue to keep everyone updated as we have additional information.

Amazon Relational Database Service (N. Virginia)

2:35 PM PDT We have restored access to the majority of RDS Multi AZ instances and continue to work on the remaining affected instances. A single Availability Zone in the US-EAST-1 region continues to experience problems for launching new RDS database instances. All other Availability Zones are operating normally. Customers with snapshots/backups of their instances in the affected Availability zone can restore them into another zone. We recommend that customers do not target a specific Availability Zone when creating or restoring new RDS database instances. We have updated our service to avoid placing any RDS instances in the impaired zone for untargeted requests.
11:42 PM PDT In line with the most recent Amazon EC2 update, we wanted to let you know that the team continues to be all-hands on deck working on the remaining database instances in the single affected Availability Zone. It’s taking us longer than we anticipated. When we have an updated ETA or meaningful new update, we will make sure to post it here. But, we can assure you that the team is working this hard and will do so as long as it takes to get this resolved.
7:08 AM PDT In line with the most recent Amazon EC2 update, we are making steady progress in restoring the remaining affected RDS instances. We expect this progress to continue over the next few hours and we’ll keep folks posted.
Amazon EC2 (N. Virginia)
6:18 PM PDT Earlier today we shared our high level ETA for a full recovery. At this point, all Availability Zones except one have been functioning normally for the past 5 hours. We have stabilized the remaining Availability Zone, but recovery is taking longer than we originally expected. We have been working hard to add the capacity that will enable us to safely re-mirror the stuck volumes. We expect to incrementally recover stuck volumes over the coming hours, but believe it will likely be several more hours until a significant number of volumes fully recover and customers are able to create new EBS-backed instances in the affected Availability Zone. We will be providing more information here as soon as we have it.Here are a couple of things that customers can do in the short term to work around these problems. Customers having problems contacting EC2 instances or with instances stuck shutting down/stopping can launch a replacement instance without targeting a specific Availability Zone. If you have EBS volumes stuck detaching/attaching and have taken snapshots, you can create new volumes from snapshots in one of the other Availability Zones. Customers with instances and/or volumes that appear to be unavailable should not try to recover them by rebooting, stopping, or detaching, as these actions will not currently work on resources in the affected zone.
10:58 PM PDT Just a short note to let you know that the team continues to be all-hands on deck trying to add capacity to the affected Availability Zone to re-mirror stuck volumes. It’s taking us longer than we anticipated to add capacity to this fleet. When we have an updated ETA or meaningful new update, we will make sure to post it here. But, we can assure you that the team is working this hard and will do so as long as it takes to get this resolved.
Amazon RDS (N. Virginia)
11:42 PM PDT In line with the most recent Amazon EC2 update, we wanted to let you know that the team continues to be all-hands on deck working on the remaining database instances in the single affected Availability Zone. It’s taking us longer than we anticipated. When we have an updated ETA or meaningful new update, we will make sure to post it here. But, we can assure you that the team is working this hard and will do so as long as it takes to get this resolved.
AWS Elastic Beanstalk (N. Virginia)
2:16 PM PDT We have observed several successful launches of new and updated environments over the last hour. A single Availability Zone in US-EAST-1 is still experiencing problems. We recommend customers do not target a specific Availability Zone when launching instances. We have updated our service to avoid placing any instances in the impaired zone for untargeted requests.
The AWS downtime counter can be seen here.

Today’s outages in Amazon’s N. Virgina Data Center that brought down various websites, could be due to an attack from the Chinese government, according to Change.org.

Change.org is an online activism platform for social change that raises awareness about important causes and connects people to opportunities for powerful action. It works with more than 1000 of the largest nonprofits in the world, has a team of hundreds of journalists and organizers that span the globe.

In a digital letter sent to its members, Change.org announces that the site has been a target to cyber attacks coming from China, probably ordered by the Chinese government. This is the third day in a row when the Change.org site is attacked by Chinese government coordinated hackers.

Amazon Web Services hosted Change.org now reports:

“Change.org is currently experiencing intermittent downtime due to a denial of service attack from China on our web site. It appears the attack is in response to a Change.org petition signed by nearly 100,000 people worldwide, who are standing against the detention of Chinese artist and activist Ai WeiWei. Despite this attack on our members and our platform, we will continue to stand with the supporters of Ai Weiwei to defend free speech and the freedom to organize for people everywhere. “

The hackers aren’t particularly targeting the website, but a petition hosted on it. The targeted petition is demanding the release of Chinese artist Ai Weiwei and it’s been signed by over 100,000 people.

Acclaimed dissident artist Ai Weiwei — who helped design the famed “Bird’s Nest” stadium for China’s Olympics — was arrested on April 3rd by Chinese security forces at the Beijing airport. His office and studio have been ransacked, and no one has heard from him since.

The international art community, including the directors of more than twenty leading museums (including the Tate Modern, Museum of Modern Art, and the Guggenheim) started a petition on Change.org. The petition quickly gained worldwide attention, including in the New York TimesLA Times, and Guardian, triggering reactions from political leaders around the world, who are calling for Weiwei’s release. Activists have organized peaceful protests at Chinese embassies and consulates.

Due to these repeated attacks, the site may be slower than usual or unavailable at times over the next few days.

As Change.org’s Patrick says: “Autocratic governments know that the internet is a democratizing force, and they’ll do everything they can to suppress online activism. Know that we stand with you for change, and that we will continue to fight to make sure your voice can be heard.