Endy:Data storage

From OpenWetWare
Jump to navigationJump to search

Problem summary

We need an easy, secure and efficient way to store all our files:

  • individual user files (backup)
  • microscope images - processed and unprocessed (storage and backup)
  • shared project files (centralized/shared storage and backup)
  • old user and project files (centralized/shared storage and backup)

Ideal functionality

(lab and individual data storage, sharing and backup needs - please list what would you like have available)

  • Capacity: we want to be able to store all files in a single location
  • Easy: automatic backup
  • Secure: the backup system shouldn't be located in building 68, in case of a fire
  • Efficient backing up or retrieving files should be speedy
  • Affordable

Types of data

  • Individual user data and project data (except microscope images)
    • Current data (~100GB?) can be stored on Bionet (backed up automatically)
    • Old data (~100GB?) can be archived on one of the existing lab servers (e.g., shmoo which may need an additional hard drive) but needs backup space.
    • ~200GB total plus backup space
  • Microscope data (processed and unprocessed?)
    • Current and old data (~170GB) can be stored the Mac G5 attached to the scope (may need an additional hard drive)
    • New data (6 months from June 2007: Samantha ~100GB, Jason ~500GB, Francois <100GB) needs both additional primary storage
    • ~900GB total plus backup space

Available resources


  • Fast, reliable, automatically backed up but fairly small storage space
  • Consists of two fiber channel disk filers (bionet and bionet2 located on the 3rd floor in building 68) that have about 3TB of usable storage shared among several labs. Data on this system is mirrored to NearStore R200 in building NE47 (as of 2007-06-04).
  • Lab storage space (as of 2007-06-19):
    • total: 200GB
    • available: 88GB
  • Problems with bionet:
    • Not enough space to backup microscope images
    • As of 2007-06-04, bionet and bionet2 are out of the support contract; R200 is under a support contract paid for by the Biology Department. It means that if bionet filers fail, there should be a copy of the data stored on R200 in building NE47.


  • Has 16TB of total storage.
  • Lab storage space (as of 2007-12-05):
    • total: 475GB
    • available: 475GB
  • Problems with r200:
    • Not enough space to backup all microscope images
    • Not backed up anywhere: no off-site backup and no snapshots (only single copy of data exists). It means that if you are using this space, you should have a backup of this data stored elsewhere.

Potential solutions

One server

  • Primary storage:
    • current user and project data on lab desktops and Bionet
    • microscope data on the scope G5 with an extra hard drive
    • old user and project data on shmoo with an extra hard drive
  • Backup storage: Bionet mirror in NE47 and a NAS box with 1.5TB of usable storage; cost: ~$1,500. Biosupport would provide maintenance for free.

Two servers

  • Primary storage:
    • current individual and project data on lab desktops and Bionet
    • microscope and old data on a new storage server with a RAID array (e.g., 4 x 500GB drives in RAID 5 configuration would provide about 1.5TB of usable storage space). Cost: ~$2,000. Host it in the BioMicro Center on the 3rd floor of building 68. Biosupport would provide maintenance for free.
  • Backup storage: second identical server (~$2,000) or a NAS box (~$1,500) and host it in Tech Square (NE47). Biosupport would provide maintenance for free.


  • Primary storage:
    • current user and project data on lab desktops and Bionet
    • microscope data on the scope G5 with an extra hard drive
    • old user and project data on shmoo with an extra hard drive
  • Backup storage: MIT TSM for backing up microscope data from the scope G5 ("soft limit" of 300GB per machine as of June 2007). Cost: unknown monthly fee (currently $7.50/month for 300GB).

Tape drive

Shelf for R200

At $7 per GB, an 8TB shelf (minimum increment available) would cost on the order of $50,000.

Implemented solutions

Purchased an internal 750GB hard drive (primary storage) plus an external 750GB hard drive (backup storage) for Jason's machine on 2007-07-11


The win.mit.edu Domain

MIT TSM Backup Service

  • Monthly service charge: $7.50 per month per computer
  • Storage limit: 300GB
    • a soft limit, some users go over
    • an approximate figure because it includes both "active" and "inactive" files but this is offset by data compression
  • TSM software is required to use the service and is available for Windows, Mac and Linux (free to MIT community per site license)
  • Backups are stored on one of the TSM backup servers in buildings W91 and E40 (no mirroring)
  • Types of backup:
    • Scheduled: everything by default but can be configured to exclude directories
    • Manual: nothing by default, need to specify which directories to backup
  • Inactive files (old versions of current files and deleted files) are kept for 30 days using incremental storage (only changes are stored)
  • Need a separate account for each computer to be backed up
  • Performance will vary, depending on time of the day, network condition and machine itself)
  • 5,000 users, 250,000 files restored per quarter
  • 128-bit encryption available

from January 2008 the service will consist of three levels:

  • Basic
    • targeted at desktops and laptops
    • 15GB of data in selected directories/folders at no cost.
    • Fee: NO COST
  • Standard
    • based on current service
    • 300GB of data
    • Fee: $15/month (effective January 2008) - this represents an increase of $7.50/month from the current fee of $7.50/month
  • Enterprise
    • designed for server-class machines
    • 10TB (Terabytes) of data
    • Fee: $65/month
    • Backup data protected in a redundant environment, including tape duplication and an option to send data offsite
    • Dedicated hardware for increased backup and restore performance


Network Attached Storage

A Tale of Two Terabyte NAS Boxes

Buffalo Technology

  • Buffalo TeraStation Home
    • Example disk configuration: 4 x 250GB IDE (750GB in RAID5)
    • Protocols: FTP, SMB
    • USB 2.0 port for external hard drive (backup or additional storage)
    • Review by PC Magazine
      • Bottom line: Flexible and reliable storage for everyone on your network. Print sharing is a plus, as is expandable USB disk storage.
      • Pros: Offers RAID level data protection; easy-to-configure shared and private storage for all workgroup members; print sharing is a plus.
      • Cons: Large footprint. No logging or reporting features.
    • Review by ExtremeTech
    • TeraStation wiki
  • Buffalo TeraStation Pro
    • Released in March 2006
    • S-ATA drives

Infrant ReadyNAS

  • ReadyNAS NV
  • ReadyNAS NV+
  • Infrant ReadyNAS NV+ and 1100: Small steps forward - review
    • comes with a 5+5-user license for EMC's Retrospect for Windows and Macintosh client backup software
    • The NV+ is a slight improvement over the NV, with most of the value coming in the Retrospect backup client bundle
    • Since both the NV and NV+ use the same processor and have the same memory, the performance difference I saw is more due to better drives in the NV+ and newer firmware than anything else
    • with the lowest price at time of review at $831 for a driveless NV+ and $517 for an NV, you might be better served by using the $300 to buy drives
  • Infrant ReadyNAS NV Review
    • X-RAID (Expandable RAID) allows to add capacity without deleting existing data, automatically adjusts RAID level and formatted capacity to match the available drives
  • ReadyNAS NV - user review
  • ReadyNAS NV - AnandTech review
  • ReadyNAS NV - PracticallyNetworked.com review


  • LaCie Ethernet Disk
    • 1TB $740, 2TB $1,050 (June 2007)
    • Rack format
    • Powered by Windows XP® Embedded
    • Gigabit Ethernet
  • LaCie Ethernet Disk RAID
    • 1TB $840, 2TB $1,160, 4TB $3,000 (June 2007)
    • 4 removable, hot-swappable drives (spare drives: 250GB $210, 500GB $390 - June 2007)
    • Gigabit Ethernet