Thursday, April 26, 2007

Speeding up Jigdo

What is Jigdo
Jigdo is a new tool that has been accepted by Debian as THE way to distribute their ISOs. The reasons are:
1. Its a simple tool that can run making use of some very basic linux utilities that are available on most linux distros
2. Can make use of partially downloaded files
3. Can make use of the previous release of an ISO and download only the packages that have changed and recreate the latest release of the ISO locally.
4. Reduces time/resources for both the user and the server that is hosting the ISO

Normally the speed is limited to the number of connections that we can make to the particular server. So more the number of connections that I use to download a particular file, the faster is the download. For example...if I download a file using wget and I get 20KBps then if I use a tool that supports splitting up the file into multiple parts, say I split the file into 5 parts, I can get atleast a 100KBps. Now thats FAST...dont you think?

Okay...so what tool can be used and how can we use it with jigdo?

My choice is aria2c. The reason is that I have used it for generic downloads and have had good performance. It supports mirrors, splitting the download, proxy, torrents, metalinks, https etc... For a complete feature list see: http://aria2.sourceforge.net/

Lets try to understand how jigdo works so we can modify it suitable:

jigdo-file is the tool used to prepare the ISO for distribution. It creates a .template extensioned file that contains the following:
1. The md5 sum of each file that is in the ISO
2. The directory structure for the file in the ISO
3. In case some files were passed to jigdo-file other than ISO, then those
4. The entire directory structure and padding information from ISO

Another file that is created by jigdo-file has an extension of .jigdo. This file uses the same format as windows ini files and has the following information
1. The [Parts] section contains md5 sum of file in ISO mapped to the URL from which it can be downloaded.
2. The [Servers] section contains URL for Core and Updates folders. These can be modified by us to point to the closest mirrors that we have.
3. [Image] gives a list of images on those servers. This is used to let the user choose which ISO he wants to download-say DVD or CD etc.

This file is pretty self explanatory.

How to go about making the download:

We need to download the .template file to the local machine and run jigdo as follows:
jigdo-lite FC-20070401-6-x86_64.jigdo
Lets assume Im trying to get the latest FC6 respin while having a copy of the first released FC6 disk.

1. jigdo-lite checks if the .template file is present in the directory. If not then it downloads it.
2. I mount the iso of the old FC6 disk in some mount point and ask jigdo to scan it.
3. Jigdo scans the files present in that and checks that against the md5 sums it has in the template file.
4. Based on this information it generates a .list file that contains all the files that need to downloaded.
5. It downloads files into .tmpdir folder. After each 10 file, the files are written into an iso.tmp file in the same structure that is mentioned in the .template file.
6. The downloaded and existing files are merged to recreate the new iso with the updated files.

So by downloading just the changed files, the entire new release iso is recreated on our local machine.

Now coming back to where we started:
jigdo uses wget which supports a lot of features except especially splitting of the download into multiple parts. This is a major disadvantage. So let us use aria2c which supports that and a lot of other features.

jigdo-lite as it turns out is a simple shell script that calls wget in turn to get the files. So all we need to do is find this line in jigdo-lite

wget --user-agent="$userAgent" $wgetOpts "$@" || return 1


and replace with

aria2c $ariaOpts "$@" || return 1

At the top of the file set ariaOpts as follows:
ariaOpts="-s4 --ftp-pasv --http-proxy=PROXY:PROXYPORT --http-proxy-user=ADUNAME --http-proxy-passwd=ADPASSWD"

This will make the script split the file into 4 parts.

Now run jigdo-lite and you stand to get atleast a 4x speedup :)


Useful Links:

http://atterer.net/jigdo/
http://aria2.sourceforge.net/