PGP and Tranche

From OpenWetWare
Revision as of 17:04, 19 April 2009 by Andrea Loehr (talk | contribs)
Jump to navigationJump to search

About        Projects        Publications        PersonalGenomes@Home        Public Data        FAQ        Updates (12/23)


People

Tranche

In order to increase the utility of project data and make more of it available to the public, the Personal Genome Project (PGP) has launched PersonalGenomes@Home. This effort uses ProteomeCommons.org's Tranche Network for persistent storage. The Tranche Project is a free and open source file sharing tool that enables collections of computers to easily share and cite scientific data sets. Designed and built with scientists and researchers in mind, Tranche essentially solves the data sharing problem in a secure and scalable fashion.

Tranche User Account

To apply for a user account fill out the form for a ProteomeCommons User Account. Pending applications are reviewed each business day.

System Requirements

Java Runtime Environment 5.0 or later; See System Requirements

Tranche User Guide and Instructions for Up- and Downloads

A detailed user guide can be found Tranche User Guide here.

There are three ways to add or get data from the network:

  1. GUI: Go to the Tranche homepage and click "Launch Tranche". (Requires Java 5+ with Web Start)
  2. Command-line tools: See below
  3. Java API: For custom tools development

The most popular of the three is the GUI, as it is easy to use. The command-line tools are useful for automating tasks or working in headless environments, and the API is useful when integrating Tranche in a software project or for creating a custom tool

Tranche up- and downloads can be run over the command line using the upload tool and the download tool.

wget --no-check-certificate https://proteomecommons.org/tranche/files/CommandLineAddFileTool.zip  
wget --no-check-certificate https://proteomecommons.org/tranche/files/CommandLineGetFileTool.zip   

In order to use these tools you also need a login, which you can get at ProteomeCommons.org.

Download each tool, unzip the file, go into unzipped directory, type java -jar NAME.jar --help to obtain usage information. (If java is not in your system path, add it to your path or type the full path /path/to/java -jar NAME.jar --help.

For usage information java -jar Tranche-Downloader.jar --help
Download a project with a certain hash: java -jar Tranche-Downloader.jar HASH

For usage information: java -jar Tranche-Uploader.jar --help
Upload a file:
java -Xmx521m -jar Tranche-Uploader.jar -u USER.zip.encrypted -p PASSWORD -c true -t "MY TITLE" -d "MY DESCRIPTION" /home/DataForUpload

There is the option to download/upload encrypted data:
java -jar Tranche-Downloader.jar -e supersecret HASH
java -jar Tranche-Uploader.jar -u FILE.zip.encrypted -p supersecret /home/DataForDownpload

Example scripts are provided: download script and upload script.

To get notified about changes and upgrades one can join the automated tool group for command-line tools and API.

Transferring Data onto Tranche

For initial data transfer, could ship (two?) USB drives to BPF:

    Attn: Andrew Gagne
    Biopolymers Facility
    77 Ave. Louis Pasteur
    Room 0088
    Boston, MA 02115

We have:

We need:

  • PGP1 - FC37_3
  • PGP3 - FC35_3
  • PGP5 - FC44_2
  • PGP7 - FC44_4
  • PGP8 - FC37_1,FC51_2,FC51_6
  • PGP9 - FC43_3,FC51_3,FC51_7
  • PGP10 - FC41_3

Also, could use:

  • CONTROL - FC35, FC37, FC41, FC43, FC44, FC51.

For all the above there is a top level directory (eg. pgp2-FC_00037_L002) and exactly 36 directories below that. Within each of those directories there are 4x100 files. For this release, it would be ideal if the data was organized in tranche as 18x100 "randomly addressable" data sets that a volunteer computer could ask for as desired. Each addressable "bundle" of data would then be 4x36 files.

Example: Upload project, download a portion using command-line tools

  • Get directory to test.

besmit@besmit-kubuntu:~/PGP-Test$ wget -r -l 1 http://genomerator.freelogy.org/~awz/pgp2-FC_00037_L002/C36.1/

  • Moved downloaded directory contents to C36.1/. Upload this directory to Tranche. Requires login to upload. See -h or --help for information about parameters. The very last argument is the directory to upload.

besmit@besmit-kubuntu:~/Desktop/TrancheLabs/Upload$ java -Xmx512m -jar Tranche-Uploader.jar -U bryan -P ********** -d "This is my description. Passphrase required for download." -t "This is my title: C35.1 encrypted" -e pgptest4 -c true C36.1/

  • This is the stderr for the project. Intended for debugging, etc.

Using batch chunk upload?: yes Started total of 10 file encoding threads.

  • This is the stdout for the project - the hash used to identify the project. This should be saved.

uiRL5wtqG5FyzE9PnJG47dbxuU3PqpX3aE2Gq9SNJa5vRvlgn14hwUEBW8UZyXIeQWLP9B49sb6/W8dBOz1+QfRC5UkAAAAAAAEnnA==

  • Download files tifs with filenames referring to G or C nucleotides

besmit@besmit-kubuntu:~/Desktop/TrancheLabs/Download$ java -Xmx512m -jar Tranche-Downloader.jar -e pgptest4 -r _[gc].tif.gz$ uiRL5wtqG5FyzE9PnJG47dbxuU3PqpX3aE2Gq9SNJa5vRvlgn14hwUEBW8UZyXIeQWLP9B49sb6/W8dBOz1+QfRC5UkAAAAAAAEnnA==

  • The only output is the path to download directory, shown when download complete

/home/besmit/Desktop/TrancheLabs/Download/tranche-downloads/C36.1


PGP on Tranche

The public PGP data are now available on Tranche.
To download using the command line tool:

  java -Xmx512m -jar Tranche-Downloader.jar -r 'PATH' PGP_HASH 

Available on Tranche are the following data. Use these paths as 'PATH' in the command above

PGP/PGP_1/PGP_1_FC_00037/PGP_1_FC_00037_L003/
PGP/PGP_3/PGP_3_FC_00035/PGP_3_FC_00035_L003/
PGP/PGP_3/PGP_3_FC_00037/PGP_3_FC_00037_L002/
PGP/PGP_5/PGP_5_FC_00044/PGP_5_FC_00044_L002/
PGP/PGP_7/PGP_7_FC_00044/PGP_7_FC_00044_L004/
PGP/PGP_8/PGP_8_FC_00037/PGP_8_FC_00037_L001/
PGP/PGP_8/PGP_8_FC_00051/PGP_8_FC_00051_L002/
PGP/PGP_8/PGP_8_FC_00051/PGP_8_FC_00051_L006/
PGP/PGP_9/PGP_9_FC_00043/PGP_9_FC_00043_L001/
PGP/PGP_9/PGP_9_FC_00051/PGP_9_FC_00051_L003/
PGP/PGP_9/PGP_9_FC_00051/PGP_9_FC_00051_L007/
PGP/PGP_10/PGP_10_FC_00041/PGP_10_FC_00041_L003/

CONTROL/CONTROL_FC00035/CONTROL_FC00035_L001/
CONTROL/CONTROL_FC00037/CONTROL_FC00037_L008/
CONTROL/CONTROL_FC00041/CONTROL_FC00041_L001/
CONTROL/CONTROL_FC00043/CONTROL_FC00043_L001/
CONTROL/CONTROL_FC00044/CONTROL_FC00044_L001/
CONTROL/CONTROL_FC00051/CONTROL_FC00051_L001/