Tweets:

Posting tweet...

Powered by Twitter Tools

Getting the jet3t S3 GUI tool to work with Walrus (Eucalyptus S3)

Jets3t (pronounced “jet-set”) is a free, open-source Java toolkit and application suite for the Amazon Simple Storage Service (Amazon S3) and Amazon CloudFront content delivery network. For some reason almost all the standard tools for accessing S3 will not easily work with the Eucalyptus equivalent to S3 called Walrus. I am use to using the excellent S3Fox add-on for Firefox and wanted some GUI tool that had similar capabilities. I was able to piece together how to get Jets3t to work with Eucalyptus Walrus. This article puts it all together in one place.

The basic build procedure is based on the instructions for downloading and building Jet3t from Source. I got the hints for what to change to make things work with Walrus from the Jets3t Users Forum article eucalyptus walrus. And an almost unrelated article hadoop in eucalyptus private cloud in the Cloudera Support Forum. Search on the page for the section that says my jets3t file has these values.

Prerequisites

These instructions assume you are on Ubuntu (I had 10.4 Lucid) though it should be easily modifiable to work on any platform that supports Java.

You’ll need to install

  • Mercurial (hg)
  • Sun Java 6 (I couldn’t get it to work with openjdk-6)
  • Ant

You can use the command:

sudo apt-get install mercurial sun-java6-jdk ant1.8

Get the Source of Jets3t with Mercurial

cd to where you want to create the directory that will contain the source. Then use the following command to create a local mercurial repository of the source. Then cd into the repository directory jets3t

hg clone http://bitbucket.org/jmurty/jets3t/
cd jets3t

Edit files to make jets3t work with Walrus

Use your favorite editor (emacs of course :-) to make the following edits.

Edit LoginCredentialsPanel.java to not check the length of login credentials

Jets3t enforces the length of the Access Key and Access Secret Key. But some newer versions of Walrus do not fit the assumptions. This edit eliminates the checks.

This should be around line 230 of src/org/jets3t/apps/cockpit/gui/LoginCredentialsPanel.java. Comment out the lines that have errors.add. It should look something like the following after you comment out the two lines with errors.add.

    if (getAWSAccessKey().length() == 20) {
        // Correct length for AWS Access Key
    } else if (getAWSAccessKey().length() == 22) {
        // Correct length for Eucalyptus ID
    } else {
        // errors.add("Access Key must have 20 or 22 characters");
	}

    if (getAWSSecretKey().length() == 40) {
        // Correct length for AWS Access Key
	} else if (getAWSSecretKey().length() == 38) {
        // Correct length for Eucalyptus Secret Key
	} else {
        //  errors.add("Secret Key must have 40 or 38 characters");
    }

Edit jets3t.properties to use parameters for accessing Walrus instead of AWS S3

You’ll want to set the following for Walrus access in jets3t/configs/jets3t.properties. The following are the values you need to change. s3service.s3-endpoint should be set to the fully qualified domain name of the host that runs Walrus (you could use an ip address I believe). I had to set s3service.https-only to false since I don’t know what it would take to set up SSL/TLS between the Java environment and the Walrus environment. If you do, let me know!


s3service.https-only=false
s3service.s3-endpoint=your_walrus_host_name
s3service.s3-endpoint-http-port=8773
s3service.s3-endpoint-https-port=8443
s3service.disable-dns-buckets=true
s3service.s3-endpoint-virtual-path=/services/Walrus

Optionally edit build.properties

If you want to mark the build version in a way that distinguishes from the standard version and or change the debug level.
I changed the version to version=0.7.4-runa.

Build Jets3t with your changes to work with Walrus

The following use the default target of dist which will create a target tree in the top level directory dist.

ant

If that works (it will say BUILD SUCCESSFUL at the end if it was) then there will be a director dist/jets3t-0.7.4-runa (or whatever you set the version value to in build.properties). You should be able to:

cd dist/jets3t-0.7.4-runa/bin
bash cockpit.sh &

This should start up an application window and a window for you to enter your Eucalyptus credentials. Select the Direct Login tab and enter you Eucalyptus Access Key and Access Secret Key.

Jets3t Cockpit Login

After you click ok, you should see your Walrus buckets!

JetS3t Cockpit

Once it all works you can use the <i>Store Credentials</i> option on the login window to store your credentials on Walrus and use a login/password to access Walrus. But that is optional.

Bonjour / AVAHI & Netatalk to share files files between Ubuntu 10.4 & Mac OS X

It use to be somewhat difficult to have Filesystems on an Ubuntu system show up on the Mac Finder the same way that other Mac Filesystems would show up. There has been the Open Source Unix implementation of the Apple File System (afp) but for a long time the Ubuntu packages were not properly configured to work transparently with modern (Snow Leopard) Mac OS X.

One blog post, HowTo: Make Ubuntu A Perfect Mac File Server And Time Machine Volume did a great job going through all the steps needed to build Netatalk from source and configure it to work very transparently with Ubuntu releases of the past. But with the Ubuntu 10.4 Lucid release, the Netatalk that is in the Ubuntu repository is built and configure to support transparent Apple File Protocol based file sharing.

But there are a few configuration issues, mainly with the Unix implementation of Bonjour resource discovery protocol, that still needs to be done to make it so you can see your Ubuntu Filesystems on your Mac’s Finder like other Macintosh instances. Also we’ll see how to make it so that the Ubuntu instance will show up as an ssh server as well.

Installing Packages

You will need to install the following packages onto your Ubuntu 10.4 instance. This assumes that you already did a clean install of Ubuntu 10.4 and used the update manager to bring it up to date. If you have already installed some of these, it should not be a problem.

Install ssh server

I can’t believe that ubuntu doesn’t install an ssh server by default. But in any case its pretty easy. This is not needed to use netatalk but I wanted to make ssh and netatalk to work and be available via bonjour.

sudo apt-get install openssh-server

Then you’ll need to set up your authorized keys on the ubuntu server. In your home directory do the following:

mkdir -p .ssh
# Copy your public key[s] to .ssh/authorized_keys (not shown here)
# Set the permissions to only allow your user to access the .ssh directory and files in there
chmod -R og-rwx .ssh

Install Netatalk

sudo apt-get install netatalk

Configure Netatalk

You don’t need to change any of the configuration files for netatalk. The defaults will enable the sharing of your home directory. If you want to share any additional filesystems from your Ubuntu instance to your Macs, you can add them to the /etc/netatalk/AppleVolumes.default. That file has explanations of al the options.

You may want to change the default last item in /etc/netatalk/AppleVolumes.default from:

~/			"Home Directory"

to something like:

~/ "$h_$u Home Directory" options:upriv,usedots

This will change the name that shows up in listing to be “hostname_username Home Directory” and will use Unix Privilages. Most importantly the usedots says to not do Hex translation of dot files. If you don’t do this, you’ll see things like
:2e_somefilename instead of .somefilename where filenames start with “dot”.

Configure AVAHI

AVAHI is probably already installed if you did a standard installation.

Copy the avahi ssh service configuration into /etc/avahi/services

sudo cp /usr/share/doc/avahi-daemon/examples/ssh.service /etc/avahi/services/

Create an avahi afpd service configuration by creating a file /etc/avahi/services/afpd.service with the following content:

<?xml version="1.0" standalone='no'?><!--*-nxml-*-->
<!DOCTYPE service-group SYSTEM "avahi-service.dtd">
<service-group>
  <name replace-wildcards="yes">%h</name>
  <service>
    <type>_afpovertcp._tcp</type>
    <port>548</port>
  </service>
  <service>
    <type>_device-info._tcp</type>
    <port>0</port>
    <txt-record>model=Xserve</txt-record>
  </service>
</service-group>

You should now be able to see the Ubuntu host in your Finder under the SHARED section on the left side of the Finder. You should also see your Ubuntu host in the “New Remote Connection” window of the Mac Terminal app (CMD-SHIFT-K) if you select the “Secure Shell (ssh)” Service.

If you don’t see the Ubuntu hostname in the FInder or in the Terminal New Remote Connection service,  restart the avahi-daemon service:

sudo restart avahi-daemon

TimeMachine Support

The new Ubuntu Netatalk package is supposed to also support TimeMachine storage. You can enable this in /etc/netatalk/AppleVolumes.default and add tm as an option to the filesystems that is published in this file. I have not tried this and many sources consider this a risky way to store Time Machine backups.

Troubleshooting

You should make sure that there is at least one afpd process running on the Ubuntu instance. You can see the log info in /var/log/daemon.log.

That’s it!

HBase/Hadoop on Mac OS X

I wanted to do some experimenting with various tools for doing Hadoop and HBase activities and didn’t want to have to bother making it work with our Cluster in the Cloud. I just wanted a simple experimental environment on my Macbook Pro running Snow Leopard Mac OS X.

So I thought it was time to revisit installing Hadoop and HBase on the Mac using the latest versions of everything. This will be deployed as Psuedo-Distributed mode native to Mac OS X. Some folks actually create a set of Linux VMs with a full Hadoop/HBase stack and run that on the Mac, but that is a bit of overkill for now.

These instructions mainly follow the standard instructions for Apache Hadoop and Apache HBase

Prerequisits

Mac OS X Xcode developer tools which includes Java 1.6.x. You can get this for free from the Apple Mac Dev Center. You have to become a member but there is a free membership available.

Download and Unpack Latest Distros

You can get a link to a mirror for Hadoop via the Hadoop Apache Mirror link and for Hbase at the HBase Apache Mirror link. Each of those links will bring you to a suggested link to a mirror for Hadoop or HBase. Once you click on the suggest link, it will bring you to a mirror with the recent releases. You can click on the stable link which will then bring you to a directory that has the latest stable Hadoop (as of this writing: hadoop-0.20.2.tar.gz) or HBase (as of this writing: hbase-0.20.3.tar.gz ). Click on those tar.gz files to download them.

I am going to keep the distros in ~/work/pkgs. I usually create a directory ~/work/pkgs and unpack the tar files there as numbered versions and then create symbolic links to them in ~/work. But you can do this all in any directory that you can control.:

cd ~/work
mkdir -p pkgs
cd pkgs
tar xvzf hadoop-0.20.2.tar.gz
tar xvzf hbase-0.20.3.tar.gz
cd ..
ln -s pkgs/hadoop-0.20.2 hadoop
ln -s pkgs/hbase-020.3 hbase
mkdir -p hadoop/logs
mkdir -p hbase/logs

Now you can have your tools all access ~/work/hadoop or ~/work/hbase and not care what version it is. You can update to later version just by downloading, untarring the distro and then just change the symbolic links.

Configure Hadoop

All the configuration files mentioned here will be in ~/work/hadoop/conf. In this example we are assuming that the Hadoop servers will only be accessed from this localhost. If you need to make it accessable from other hosts or VMs on your lan that support Bonjour, you could use the bonjour name (ie. the name of your mac followed by .local such as mymac.local) instead of localhost in the following Hadoop and HBase configuraitons

hadoop-env.sh

Mainly need to tell Hadoop where your JAVA_HOME is.

Add the following line below the commented out JAVA_HOME line is in hadoop-env.sh

export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/CurrentJDK/Home

core-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
  </property>
</configuration>

hdfs-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
</configuration>

mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
  <property>
    <name>mapred.job.tracker</name>
    <value>localhost:9001</value>
  </property>
</configuration>

Make sure you can ssh without a password to the hostname used in the configs

The Hadoop and Hbase start/stop scripts use ssh to access the various servers. In this case of doing a Pseudo-Distributed mode, everything is running on the localhost, but we still need to allow the scripts to ssh to the localhost.

Check that you can ssh to the localhost (or whatever hostname you used in the above configs)

We’re assuming that we’ll be running the Hadoop/HBase servers as the same user as our login. You can set things up to run as the hadoop user, but its kind of complicated on Mac OS X. See the section File System Layout in an earlier post Hadoop, HDFS and Hbase on Ubuntu & Macintosh Leopard. That section and a few other points thru that post describe how to create and use a hadoop user to run the Hadoop and HBase servers.

Back to just doing this as our own user. Test that you can ssh to the localhost without a password:

ssh localhost

If you see something like the following paragraph that ends up with a password prompt, then you need to add a key to your ssh setup that does not need a password (you may need to say yes if you are asked if you want to continue connecting).

The authenticity of host 'localhost (::1)' can't be established.
RSA key fingerprint is 3c:5d:6a:39:64:78:02:9d:a3:c9:69:68:50:23:71:eb.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
Password:

To create a passwordless key and add it to your set of authorized keys that can access your host, do the following (as yourself, not as root. The id_dsa file name can be arbitrary):

ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa_for_hadoop
cat ~/.ssh/id_dsa_for_hadoop.pub >> ~/.ssh/authorized_keys

If you have strong alternative opinions on how to set up your own keys to accomplish the same thing please do it your own way. This is just the basic way of doing a passwordless ssh. You may want to use a key you already have lying around or some other mechanism.

Start Hadoop

One time format of  Hadoop File System

Only once, before the first time you use Hadoop, you have to create a formated Hadoop File System. Don’t do this again once you have data in your Hadoop file system as it will erase anything you might have saved there. You may have to do this command again if somehow you screw up your file system. But its not something to do lightly the second time.

~/work/hadoop/bin/hadoop namenode -format

If all goes well, you should see something like:

10/05/02 18:45:04 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = Psion.local/192.168.50.16
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 0.20.2
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
10/05/02 18:45:04 INFO namenode.FSNamesystem: fsOwner=rberger,rberger,admin,com.apple.access_screensharing,_developer,_lpoperator,_lpadmin,_appserveradm,_appserverusr,localaccounts,everyone,com.apple.sharepoint.group.2,com.apple.sharepoint.group.3,dev,com.apple.sharepoint.group.1,workgroup
10/05/02 18:45:04 INFO namenode.FSNamesystem: supergroup=supergroup
10/05/02 18:45:04 INFO namenode.FSNamesystem: isPermissionEnabled=true
10/05/02 18:45:04 INFO common.Storage: Image file of size 97 saved in 0 seconds.
10/05/02 18:45:04 INFO common.Storage: Storage directory /tmp/hadoop-rberger/dfs/name has been successfully formatted.
10/05/02 18:45:04 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at Psion.local/192.168.50.16
************************************************************/

Starting and stopping Hadoop

Now you can start Hadoop. You will use this command to start Hadoop in general:

~/work/hadoop/bin/start-all.sh

You can stop Hadoop with the command

~/work/hadoop/bin/stop-all.sh

But remember if you are running HBase, stop that first, then stop Hadoop.

Making sure Hadoop is working

You can see the Hadoop logs in ~/work/hadoop/logs

You should be able to see the Hadoop Namenode web interface at http://localhost:50070/ and the JobTracker Web Interface at http://localhost:50030/. If not, check that you have 5 java processes running where each of those java processes have one of the following as their last command line (as seen from a ps ax | grep hadoop command) :

org.apache.hadoop.mapred.JobTracker
org.apache.hadoop.hdfs.server.namenode.NameNode
org.apache.hadoop.mapred.TaskTracker
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode
org.apache.hadoop.hdfs.server.datanode.DataNode

If you do not see these 5 processes, check the logs in ~work/hadoop/logs/*.{out,log} for messages that might give you a hint as to what went wrong.

Run some example map/reduce jobs

The Hadoop distro comes with some example / test map / reduce jobs. Here we’ll run them and make sure things are working end to end.

cd ~/work/hadoop
# Copy the input files into the distributed filesystem
# (there will be no output visible from the command):
bin/hadoop fs -put conf input
# Run some of the examples provided:
# (there will be a large amount of INFO statements as output)
bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
# Examine the output files:
bin/hadoop fs -cat output/part-00000

The resulting output should be something like:

3	dfs.class
2	dfs.period
1	dfs.file
1	dfs.replication
1	dfs.servers
1	dfsadmin
1	dfsmetrics.log

Configuring HBase

The following config files all reside in ~/work/hbase/conf. As mentioned earlier, use a FQDN or a Bonjour name instead of localhost if you need remote clients to access HBase. But if you don’t use localhost here, make sure you do the same in the Hadoop config.

hbase-env.sh

Add the following line below the commented out JAVA_HOME line is in hbase-env.sh

export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/CurrentJDK/Home

Add the following line below the commented out HBASE_CLASSPATH= line

export HBASE_CLASSPATH=${HOME}/work/hadoop/conf

hbase-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://localhost:9000/hbase</value>
    <description>The directory shared by region servers.
    </description>
  </property>
</configuration>

Making Sure HBase is Working

If you do a ps ax | grep hbase you should see two java processes. One should end with:
org.apache.hadoop.hbase.zookeeper.HQuorumPeer start
And the other should end with:
org.apache.hadoop.hbase.master.HMaster start
Since we are running in the Pseudo-Distributed mode, there will not be any explicit regionservers running. If you have problems, check the logs in ~/work/hbase/logs/*.{out,log}

Testing HBase using the HBase Shell

From the unix prompt give the following command:

~/work/hbase/bin/hbase shell

Here is some example commands from the Apache HBase Installation Instructions:

base> # Type "help" to see shell help screen
hbase> help
hbase> # To create a table named "mylittletable" with a column family of "mylittlecolumnfamily", type
hbase> create "mylittletable", "mylittlecolumnfamily"
hbase> # To see the schema for you just created "mylittletable" table and its single "mylittlecolumnfamily", type
hbase> describe "mylittletable"
hbase> # To add a row whose id is "myrow", to the column "mylittlecolumnfamily:x" with a value of 'v', do
hbase> put "mylittletable", "myrow", "mylittlecolumnfamily:x", "v"
hbase> # To get the cell just added, do
hbase> get "mylittletable", "myrow"
hbase> # To scan you new table, do
hbase> scan "mylittletable"

You can stop hbase with the command:

~/work/hbase/bin/stop-hbase.sh

Once that has stopped you can stop hadoop:

~/work/hadoop/bin/stop-all.sh

Conclusion

You should now have a fully working Pseudo-Distributed Hadoop / HBase setup on your Mac. This is not suitable for any kind of large data or production project. In fact it will probably fail if you try to do anything with lots of data or high volumes of I/O. HBase seems to not like to work well until you get 4 – 5 regionservers.

But this Pseudo-Distributed version should be fine for doing experiments with tools and small data sets.

Now I can get on with playing with Cascading-Clojure and Cascalog!

Copy an EBS AMI image to another Amazon EC2 Region

Since I’ve already created an image I liked in the us-west-1 region, I would like to reuse it in other regions. Turns out there is no mechanism within Amazon EC2 to do that. (See How do I launch an Amazon EBS volume from a snapshot across Regions?). I did find one post that talked a bit about how it can be done “out of band”. So I figured I would give that a try instead of doing a full recreation in the new region.

Prepare the Source Instance and Volume

Start an instance in the source region

Here I’ll start an instance in us-west-1a where I have the EBS image I want to copy. In this case I’ll use the image I want to copy, but it could be any image as long as its in the same region as the EBS AMI image that is to be copied. Though we are going to use the instance info to figure out some parameters for creating the new AMI. So if you don’t make the source instance the same AMI as the one you are copying you will need to supply some of the parameters yourself.

You can use a tool like ElasticFox to do the following creating of instances. Here we’ll do it with command line tools.

Set some Shell source variables on host machine

Just to make using these instructions as a cookbook, we’ll have some shell variables that you can set once and then all the instructions will use the variables so you can just cut and paste the instructions into your shell.

src_keypair=id_runa-staging-us-west
src_fullpath_keypair=~/.ssh/runa/id_runa-staging-us-west
src_availability_zone=us-west-1a
src_instance_type=m1.large
src_region=us-west-1
src_origin_ami=ami-1f4e1f5a
src_device=/dev/sdh
src_dir=/src
src_user=ubuntu

Start up the source instance and capture the instanceid

src_instanceid=$(ec2-run-instances \
  --key $src_keypair \
  --availability-zone $src_availability_zone \
  --instance-type $src_instance_type \
  $src_origin_ami \
  --region $src_region  | \
  egrep ^INSTANCE | cut -f2)
echo "src_instanceid=$src_instanceid"

# Wait for the instance to move to the “running” state
while src_public_fqdn=$(ec2-describe-instances --region $src_region "$src_instanceid" | \
  egrep ^INSTANCE | cut -f4) && test -z $src_public_fqdn; do echo -n .; sleep 1; done
echo src_public_fqdn=$src_public_fqdn

This should loop till you see something like:

$ echo src_public_fqdn=$src_public_fqdn
src_public_fqdn=ec2-184-72-2-93.us-west-1.compute.amazonaws.com

Create a volume from the EBS AMI snapshot

Normally when starting an EBS AMI instance, it automatically created a volume from the snapshot associated with the AMI. Here we create the volume from the snapshot ourselves

# Get the volume id
ec2-describe-instances --region $src_region "$src_instanceid" > /tmp/src_instance_info
src_volumeid=$(egrep ^BLOCKDEVICE /tmp/src_instance_info | cut -f3); echo $src_volumeid
# Now get the snapshot id from the volume id
ec2-describe-volumes --region $src_region $src_volumeid | egrep ^VOLUME > /tmp/volume_info
src_snapshotid=$(cut /tmp/volume_info | cut -f2)
echo $src_snapshotid
src_size=$(cut /tmp/volume_info | cut -f2)
echo $src_size
# Create a new volume from the snapshot
src_volumeid=$(ec2-create-volume --region $src_region --snapshot $src_snapshotid -z $src_availability_zone | egrep ^VOLUME | cut -f2)
echo $src_volumeid

Mount the EBS Image of the AMI you want to copy

Now we’ll mount the EBS AMI image as a plain mount on the running source instance. In this case we’re going to use the same image as we launched, but it doesn’t have to be the same image or even the same architecture.

ec2-attach-volume --region $src_region $src_volumeid -i $src_instanceid -d $src_device

You should see something like:

ATTACHMENT	vol-6e7fee06	i-fb0804be	/dev/sdh	attaching	2010-03-14T09:02:58+0000

Prepare the Destination Instance and Volume

Set some Shell destination variables on host machine

You’ll want to tune these to your needs. This example makes the destination size the same as the source. You could make the destination an arbitrary size as long as it fits the source data.

dst_keypair=runa-production-us-east
dst_fullpath_keypair=~/.ssh/runa/id_runa-production-us-east
dst_availability_zone=us-east-1b
dst_instance_type=m1.large
dst_region=us-east-1
dst_origin_ami=ami-7d43ae14
dst_size=$src_size
dst_device=/dev/sdh
dst_dir=/dst
dst_user=ubuntu

Start up the destination instance and capture the dst_instanceid

dst_instanceid=$(ec2-run-instances \
  --key $dst_keypair \
  --availability-zone $dst_availability_zone \
  --instance-type $dst_instance_type \
  $dst_origin_ami \
  --region $dst_region  | \
  egrep ^INSTANCE | cut -f2)
echo "dst_instanceid=$dst_instanceid"

# Wait for the instance to move to the “running” state
while dst_public_fqdn=$(ec2-describe-instances --region $dst_region "$dst_instanceid" | \
  egrep ^INSTANCE | cut -f4) && test -z $dst_public_fqdn; do echo -n .; sleep 1; done
echo dst_public_fqdn=$dst_public_fqdn

This should loop till you see something like:

$ echo dst_public_fqdn=$dst_public_fqdn
dst_public_fqdn=ec2-184-73-71-160.compute-1.amazonaws.com

Create an empty destination volume

dst_volumeid=$(ec2-create-volume --region $dst_region --size $dst_size -z $dst_availability_zone | egrep ^VOLUME | cut -f2)
echo $dst_volumeid

Mount the EBS Image of the AMI you want to copy

Now we’ll mount the EBS AMI image as a plain mount on the running source instance. In this case we’re going to use the same image as we launched, but it doesn’t have to be the same image or even the same architecture.

ec2-attach-volume --region $dst_region $dst_volumeid -i $dst_instanceid -d $dst_device

You should see something like:

ATTACHMENT	vol-450ed02c	i-65be1f0e	/dev/sdh	attaching	2010-03-14T09:39:20+0000

Copy the data from the Source Volume to the Destination Volume

Copy your credentials to the source machine

We’re going to use rsync to copy from the source to the destination tunneled thru ssh. This eliminates any issues with EC2 security groups. But it does mean you have to copy an ssh private key to the source machine that will then be able to access the destination machine via ssh.

scp -i $src_fullpath_keypair $dst_fullpath_keypair ${src_user}@${src_public_fqdn}:.ssh

Mount the source and destination volumes on their instances

ssh -i $src_fullpath_keypair ${src_user}@${src_public_fqdn} sudo mkdir -p $src_dir
ssh -i $src_fullpath_keypair ${src_user}@${src_public_fqdn} sudo mount $src_device $src_dir
ssh -i $dst_fullpath_keypair ${dst_user}@${dst_public_fqdn} sudo mkfs.ext3 -F $dst_device
ssh -i $dst_fullpath_keypair ${dst_user}@${dst_public_fqdn} sudo mkdir -p $dst_dir
ssh -i $dst_fullpath_keypair ${dst_user}@${dst_public_fqdn} sudo mount $dst_device $dst_dir

Get the FQDN of the Amazon internal address of the destination machine

We’re assuming that the dst instance is the us-east equivalent base AMI of the us-west source base AMI so we can use these kernel and ramdisk to build the new AMI later.

ec2-describe-instances --region $dst_region "$dst_instanceid" > /tmp/dst_instance_info
dst_internal_fqdn=$(egrep ^INSTANCE /tmp/dst_instance_info | cut -f5); echo $dst_internal_fqdn
dst_kernel=$(egrep ^INSTANCE /tmp/dst_instance_info | cut -f13); echo $dst_kernel
dst_ramdisk=$(egrep ^INSTANCE /tmp/dst_instance_info | cut -f14) ;echo $dst_ramdisk

Commands to run on the source machine

You could do the rsync by logging into the source machine and do the following. I tried to do this by using ssh commands, but the fact that the first ssh from source to destination has to be authenticated was a blocker for me. You could log into the source machine and then sudo ssh to the destination machine (you have to do sudo ssh since the rsync has to be run with sudo and the keys are stored separately for the sudo user and the regular user).
I’ll show both ways.
Here’s how you can ssh to the source machine:

ssh -i $src_fullpath_keypair ${src_user}@${src_public_fqdn}

Set up some shell variables on the source machine shell environment

# This is the key you just copied over
dst_fullpath_keypair=~/.ssh/id_runa-production-us-east
# You need to use the Public FQDN of the destination since its cross region
dst_keypair=runa-production-us-east
src_public_fqdn=ec2-184-72-2-93.us-west-1.compute.amazonaws.com
dst_public_fqdn=ec2-184-73-71-160.compute-1.amazonaws.com
dst_user=ubuntu
src_user=ubuntu
src_dir=/src
dst_dir=/dst

Do the rsync

We are using the rsync options

  • P Keep partial transferred files and Show Progress
  • H Preserve Hard Links
  • A Preserve ACLs
  • X Preserve extended attributes
  • a Archive mode
  • z Compress files for transfer
rsync -PHAXaz --rsh "ssh -i /home/${src_user}/.ssh/id_${dst_keypair}" --rsync-path "sudo rsync" ${src_dir}/ ${dst_user}@${dst_public_fqdn}:${dst_dir}/

If you want to do the rsync from your local host

I found that I still had to log into the source instance

ssh -i $src_fullpath_keypair ${src_user}@${src_public_fqdn}

and then on the source instance do:

sudo ssh -i /home/${src_user}/.ssh/id_${dst_keypair} ${dst_user}@${dst_public_fqdn}

and accept “The authenticity of host” for the first time so the destination host is in the known keys of the sudo user
Then back on your local host you can issue the remote command that will run on the source instance and rsync to the destination host:

ssh -i $src_fullpath_keypair ${src_user}@${src_public_fqdn} sudo "rsync -PHAXaz --rsh \"ssh -i /home/${src_user}/.ssh/id_${dst_keypair}\" --rsync-path \"sudo rsync\" ${src_dir}/ ${dst_user}@${dst_public_fqdn}:${dst_dir}/"

Complete the new AMI from your Local Host

The remaining steps will be done back on your local host. This assumes that the shell variables we set up earlier are still there.

Some Cleanup for new Region

Ubuntu has their apt sources tied to the region you are in. So we have to update the apt sources for the new region.
We’ll do this by chrooting to the mount /dst directory and running some commands as if they were being run on an ami with the /dst image. We might as well update things at the same time to the latest packages.

# Allow network access from chroot environment
ssh -i $dst_fullpath_keypair ${dst_user}@${dst_public_fqdn} sudo cp /etc/resolv.conf $dst_dir/etc/

# Upgrade the system and install packages
ssh -i $dst_fullpath_keypair ${dst_user}@${dst_public_fqdn} sudo -E chroot $dst_dir mount -t proc none /proc
ssh -i $dst_fullpath_keypair ${dst_user}@${dst_public_fqdn} sudo -E chroot $dst_dir mount -t devpts none /dev/pts

cat <<EOF > /tmp/policy-rc.d
#!/bin/sh
exit 101
EOF
scp -i $dst_fullpath_keypair /tmp/policy-rc.d ${dst_user}@${dst_public_fqdn}:/tmp
ssh -i $dst_fullpath_keypair ${dst_user}@${dst_public_fqdn} sudo mv /tmp/policy-rc.d $dst_dir/usr/sbin/policy-rc.d

ssh -i $dst_fullpath_keypair ${dst_user}@${dst_public_fqdn} chmod 755 $dst_dir/usr/sbin/policy-rc.d

# This has to be done to set up the Locale & apt sources
ssh -i $dst_fullpath_keypair ${dst_user}@${dst_public_fqdn} DEBIAN_FRONTEND=noninteractive sudo -E chroot $dst_dir /usr/bin/ec2-set-defaults

# Update the apt sources
ssh -i $dst_fullpath_keypair ${dst_user}@${dst_public_fqdn} DEBIAN_FRONTEND=noninteractive sudo -E chroot $dst_dir apt-get update

# Optionally update the packages
ssh -i $dst_fullpath_keypair ${dst_user}@${dst_public_fqdn} DEBIAN_FRONTEND=noninteractive sudo -E chroot $dst_dir apt-get dist-upgrade -y

# Optionally update your gems
ssh -i $dst_fullpath_keypair ${dst_user}@${dst_public_fqdn} sudo -E chroot $dst_dir gem update --system
ssh -i $dst_fullpath_keypair ${dst_user}@${dst_public_fqdn} sudo -E chroot $dst_dir gem update

Clean up from the building of the image

ssh -i $dst_fullpath_keypair ${dst_user}@${dst_public_fqdn} sudo chroot $dst_dir umount /proc
ssh -i $dst_fullpath_keypair ${dst_user}@${dst_public_fqdn} sudo -E chroot $dst_dir umount /dev/pts
ssh -i $dst_fullpath_keypair ${dst_user}@${dst_public_fqdn} sudo -E rm -f $dst_dir/usr/sbin/policy-rc.d

There are a few more shell variables we’ll need

I got the kernel and ramdisk from the destination instance since it has the alestic.com us-east-1 equivalent base AMI to the us-west-1 one that we are copying from.

# Some info for creating the name and description
codename=karmic
release=9.10
tag=server

# Make sure you set this as appropriate
# 64bit
arch=x86_64

# You will need to set the aki and ari values base on the actual base AMI you used
# It will be different for different regions.  These are set for x86_64 and us-east-1
ebsopts="--kernel=${dst_kernel} --ramdisk=${dst_ramdisk}"
ebsopts="$ebsopts --block-device-mapping /dev/sdb=ephemeral0"

now=$(date +%Y%m%d-%H%M)
# Make this specific to what you are making
chef_version="0.8.6"
prefix=runa-chef-${chef_version}-ubuntu-${release}-${codename}-${tag}-${arch}-${now}
description="Runa Chef ${chef_version} Ubuntu $release $codename $tag $arch $now"

Snapshot the Destination Volume and register the new AMI in the destination region

# Unmount the destination filesystem
ssh -i $dst_fullpath_keypair ${dst_user}@${dst_public_fqdn} sudo umount $dst_dir

# Detach the Destination Volume (it may speed up the snapshot)
ec2-detach-volume --region $dst_region "$dst_volumeid"

# Make the snapshot
dst_snapshotid=$(ec2-create-snapshot -region $dst_region -d "$description" $dst_volumeid | cut -f2)

# Wait for snapshot to complete. This can take a while
while ec2-describe-snapshots --region $dst_region "$dst_snapshotid" | grep -q pending
  do echo -n .; sleep 1; done

# Register the Destination Snapshot as a new AMI in the Destination Region
new_ami=$(ec2-register \
  --region $dst_region \
  --architecture $arch \
  --name "$prefix" \
  --description "$description" \
  $ebsopts \
  --snapshot "$dst_snapshotid")
echo $new_ami

Conclusion

You should now have a shiny new ami in your destination region. Use the value of $new_ami to start a new instance in your destination region using your favorite tool or technique.