PGP and Tranche

From OpenWetWare
Revision as of 15:10, 18 January 2009 by Andrea Loehr (talk | contribs)
Jump to navigationJump to search

About        Projects        Publications        PersonalGenomes@Home        Public Data        FAQ        Updates (12/23)


People

Tranche

In order to increase the utility of project data and make more of it available to the public, the Personal Genome Project (PGP) has launched PersonalGenomes@Home. This effort uses Tranche for persistent storage. The Tranche Project is a free and open source file sharing tool that enables collections of computers to easily share and cite scientific data sets. Designed and built with scientists and researchers in mind, Tranche essentially solves the data sharing problem in a secure and scalable fashion.

Tranche User Account

To apply for a user account fill out the form at Tranche User Account Application. Pending account applications are reviewed weekly on Mondays.

System Requirements

Java Runtime Environment 5.0 or later; See System Requirements

Tranche User Guide and Instructions for Up- and Downloads

A detailed user guide can be found Tranche User Guide here.

There are three ways to add or get data from the network:

  1. GUI: Go to the Tranche homepage and click "Launch Tranche". (Requires Java 5+ with Web Start)
  2. Command-line tools: See below
  3. Java API: For custom tools development

The most popular of the three is the GUI, as it is easy to use. The command-line tools are useful for automating tasks or working in headless environments, and the API is useful when integrating Tranche in a software project or for creating a custom tool

Tranche up- and downloads can be run over the command line using the upload tool and the download tool.

wget http://tranche.proteomecommons.org/files/CommandLineAddFileTool.zip  
wget http://tranche.proteomecommons.org/files/CommandLineGetFileTool.zip   

In order to use these tools you also need a certificate, which you can get instantly at Tranche Autocert. It comes in the form of USER.zip.encrypted.

Download each tool, unzip the file, go into unzipped directory, type java -jar NAME.jar --help to obtain usage information. (If java is not in your system path, add it to your path or type the full path /path/to/java -jar NAME.jar --help.

For usage information java -jar Tranche-Downloader.jar --help
Download a project with a certain hash: java -jar Tranche-Downloader.jar HASH

For usage information: java -jar Tranche-Uploader.jar --help
Upload a file:
java -Xmx521m -jar Tranche-Uploader.jar -u USER.zip.encrypted -p PASSWORD -c true -t "MY TITLE" -d "MY DESCRIPTION" /home/DataForUpload

There is the option to download/upload encrypted data:
java -jar Tranche-Downloader.jar -e supersecret HASH
java -jar Tranche-Uploader.jar -u FILE.zip.encrypted -p supersecret /home/DataForDownpload

Example scripts are provided: download script and upload script.

To get notified about changes and upgrades one can join the automated tool group for command-line tools and API.


Transferring Data onto Tranche

For initial data transfer, could ship (two?) USB drives to BPF:

    Attn: Andrew Gagne
    Biopolymers Facility
    77 Ave. Louis Pasteur
    Room 0088
    Boston, MA 02115

We have:

We need:

  • PGP1 - FC37_3
  • PGP3 - FC35_3
  • PGP5 - FC44_2
  • PGP7 - FC44_4
  • PGP8 - FC37_1,FC51_2,FC51_6
  • PGP9 - FC43_3,FC51_3,FC51_7
  • PGP10 - FC41_3

Also, could use:

  • CONTROL - FC35, FC37, FC41, FC43, FC44, FC51.

For all the above there is a top level directory (eg. pgp2-FC_00037_L002) and exactly 36 directories below that. Within each of those directories there are 4x100 files. For this release, it would be ideal if the data was organized in tranche as 18x100 "randomly addressable" data sets that a volunteer computer could ask for as desired. Each addressable "bundle" of data would then be 4x36 files.

Example: Upload project, download a portion using command-line tools

  • Get directory to test.

besmit@besmit-kubuntu:~/PGP-Test$ wget -r -l 1 http://genomerator.freelogy.org/~awz/pgp2-FC_00037_L002/C36.1/

  • Moved downloaded directory contents to C36.1/. Upload this directory to Tranche. Requires login to upload. See -h or --help for information about parameters. The very last argument is the directory to upload.

besmit@besmit-kubuntu:~/Desktop/TrancheLabs/Upload$ java -Xmx512m -jar Tranche-Uploader.jar -U bryan -P ********** -d "This is my description. Passphrase required for download." -t "This is my title: C35.1 encrypted" -e pgptest4 -c true C36.1/

  • This is the stderr for the project. Intended for debugging, etc.

Using batch chunk upload?: yes Started total of 10 file encoding threads.

  • This is the stdout for the project - the hash used to identify the project. This should be saved.

uiRL5wtqG5FyzE9PnJG47dbxuU3PqpX3aE2Gq9SNJa5vRvlgn14hwUEBW8UZyXIeQWLP9B49sb6/W8dBOz1+QfRC5UkAAAAAAAEnnA==

  • Download files tifs with filenames referring to G or C nucleotides

besmit@besmit-kubuntu:~/Desktop/TrancheLabs/Download$ java -Xmx512m -jar Tranche-Downloader.jar -e pgptest4 -r _[gc].tif.gz$ uiRL5wtqG5FyzE9PnJG47dbxuU3PqpX3aE2Gq9SNJa5vRvlgn14hwUEBW8UZyXIeQWLP9B49sb6/W8dBOz1+QfRC5UkAAAAAAAEnnA==

  • The only output is the path to download directory, shown when download complete

/home/besmit/Desktop/TrancheLabs/Download/tranche-downloads/C36.1



Upload/Donwload Test from Boinc

These tests were conducted with build 3432.
A test_tree/ directory was created o boinc, with 36 subdirectories C1.1 - C36.1, each containing the four ACGT tif images for four tiles 1 - 4. The raw tif images were gzipped, resulting in a total size of test_tree/ of 2.503GB.

Upload

All attempts by AL to upload test_tree/ from boinc have so far been unsuccessful.
BS uploaded the data from [http:boinc-dev.freelogy.org/~aloehr] in 68 minutes.
It appears that the upload tool is not appropriate in its speed for the data volume of the initial ~1TB that PGP will need to transport.

Download

The entire test_tree/ directory was downloaded from boinc within 17 minutes. The download, however, was faulty and 38 images were missing. This results was reproduced twice.

 INFO: I/O exception (java.net.ConnectException) caught when processing request: Connection refused
Jan 18, 2009 2:15:37 PM org.apache.commons.httpclient.HttpMethodDirector executeWithRetry
INFO: Retrying request
Tranche downloader error: A total of 38 errors occurred while downloading project:

Also, the subdirectories C1.1 - C36.1 contained these files:
C1.1/index.html
C1.1/index.html?C=D;O=A
C1.1/index.html?C=D;O=D
C1.1/index.html?C=M;O=A
C1.1/index.html?C=M;O=D
C1.1/index.html?C=N;O=A
C1.1/index.html?C=N;O=D
C1.1/index.html?C=S;O=A
C1.1/index.html?C=S;O=D


The attempt to download the data for one tile, i.e. four ACGT tif images per directory, remained unsuccessful when using the command. While the directory structure was downloaded, no images were copied, but the html files listed above. the following commands were used:

 time java -Xmx512m -jar Tranche-Downloader.jar -r _1* HASH 
time java -Xmx512m -jar Tranche-Downloader.jar -r _1_[acgt].tif$

The error messages were:

 java.io.IOException: The RemoteTrancheServer is in an inoperable state. Status: download thread is finished
at org.tranche.remote.RemoteTrancheServer.checkUploadAndDownloadThreads(RemoteTrancheServer.java:2035)
at org.tranche.remote.RemoteTrancheServer.addCallback(RemoteTrancheServer.java:294)
at org.tranche.remote.RemoteTrancheServer.hasDataInternal(RemoteTrancheServer.java:1037)
at org.tranche.remote.RemoteTrancheServer.hasData(RemoteTrancheServer.java:433)
at org.tranche.get.GetFileTool$DataDownloadingThread.run(GetFileTool.java:4691)