Endy:Data storage: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
 
(18 intermediate revisions by the same user not shown)
Line 1: Line 1:
== 1. The problem, briefly stated ==
==Problem summary==
We need an easy, secure and efficient way to store all our files:
We need an easy, secure and efficient way to store all our files:
*individual user files (backup)
*individual user files (backup)
*shared project files (centralized storage and backup)
*microscope images - processed and unprocessed (storage and backup)
*old user and project files (centralized storage and backup)
*shared project files (centralized/shared storage and backup)
*old user and project files (centralized/shared storage and backup)


== 2. Current specifications our backup system ==
==Ideal functionality==
 
Most people use Bionet to backup their files.
 
=== Bionet ===
* Storage space (as of 2007-01-16):
**total: 110GB
**used: 86GB
**available: 24GB
* Where is it physically located?
*:Building 68, 3rd floor
* Are Bionet files regularly copied and stored somewhere else?
*:Yes, to the MIT datacenter in W91.
* Problems with Bionet:
** Not enough space to backup all our files e.g. microscope images
** Isn't Bionet not going to be backed up anymore?
**:This is currently (2007-01-23) unknown, will depend on the allocated budget.
 
== 3. Ideal specifications of our future backup system ==
(lab and individual data storage, sharing and backup needs - please list what would '''you''' like have available)
(lab and individual data storage, sharing and backup needs - please list what would '''you''' like have available)
* '''Capacity''': we want to be able to store all files in a single location
* '''Capacity''': we want to be able to store all files in a single location
Line 31: Line 14:
* '''Affordable'''
* '''Affordable'''


===Types of data===
==Types of data==
*Individual user data
*Individual user data and project data (except microscope images)
**active:
**Current data (~100GB?) can be stored on Bionet (backed up automatically)
***stored on: Bionet (easy to access, backed up), some on lab computers
**Old data (~100GB?) can be archived on one of the existing lab servers (e.g., shmoo which may need an additional hard drive) but needs backup space.
***size: ?
**~200GB total plus backup space
**inactive:
*Microscope data (processed and unprocessed?)
***stored on: Bionet (easy to access, backed up), some on lab computers, including shmoo (~10GB?)
**Current and old data (~170GB) can be stored the Mac G5 attached to the scope (may need an additional hard drive)
***size: ?
**New data (6 months from June 2007: Samantha ~100GB, Jason ~500GB, Francois <100GB) needs both additional primary storage
*Project data
**~900GB total plus backup space
**active:
 
***stored on: Bionet (easy to access, backed up)
==Available resources==
***size: ?
 
**inactive:
===Bionet===
***stored on: Bionet (easy to access, backed up)
*Fast, reliable, automatically backed up but fairly small storage space
***size: ?
*[https://biowiki.mit.edu/biowiki/index.php/Storage_Services Consists] of two fiber channel disk filers (bionet and bionet2 located on the 3rd floor in building 68) that have about 3TB of usable storage shared among several labs. Data on this system is mirrored to NearStore R200 in building NE47 (as of 2007-06-04).
*Microscope data
*Lab storage space (as of 2007-06-19):
**stored on: lab computer
**total: 200GB
**size: 170GB
**available: 88GB
*Problems with bionet:
**Not enough space to backup microscope images
**As of 2007-06-04, bionet and bionet2 are out of the support contract; R200 is under a support contract paid for by the Biology Department.  It means that if bionet filers fail, there should be a copy of the data stored on R200 in building NE47.
 
===R200===
*[https://biowiki.mit.edu/biowiki/index.php/NetApp_R200_Nearstore Has] 16TB of total storage.
*Lab storage space (as of 2007-12-05):
**total: 475GB
**available: 475GB
*Problems with r200:
**Not enough space to backup all microscope images
**Not backed up anywhere: no off-site backup and no snapshots (only single copy of data exists).  It means that if you are using this space, you should have a backup of this data stored elsewhere.
 
==Potential solutions==
 
===One server===
*'''Primary storage''':
**current user and project data on lab desktops and Bionet
**microscope data on the scope G5 with an extra hard drive
**old user and project data on shmoo with an extra hard drive
*'''Backup storage''': Bionet mirror in NE47 and a NAS box with 1.5TB of usable storage; cost: ~$1,500. Biosupport would provide maintenance for free.
 
===Two servers===
*'''Primary storage''':
**current individual and project data on lab desktops and Bionet
**microscope and old data on a new storage server with a RAID array (e.g., 4 x 500GB drives in [[Wikipedia:Standard_RAID_levels#RAID_5|RAID 5]] configuration would provide about 1.5TB of usable storage space). Cost: ~$2,000. Host it in the BioMicro Center on the 3rd floor of building 68. Biosupport would provide maintenance for free.
*'''Backup storage''': second identical server (~$2,000) or a NAS box (~$1,500) and host it in Tech Square (NE47). Biosupport would provide maintenance for free.
 
===Temporary===
*'''Primary storage''':
**current user and project data on lab desktops and Bionet
**microscope data on the scope G5 with an extra hard drive
**old user and project data on shmoo with an extra hard drive
*'''Backup storage''': MIT TSM for backing up microscope data from the scope G5 ("soft limit" of 300GB per machine as of June 2007).  Cost: unknown monthly fee (currently $7.50/month for 300GB).


== 4. Potential solutions ==
===Tape drive===
*[http://www.dell.com/content/products/compare.aspx/tapebackup?c=us&cs=04&l=en&s=bsd Dell PowerVault]
**Capacity: 36GB - 800GB per tape
**Cost: $600 - $4,000
 
===Shelf for R200===
At [http://searchstorage.techtarget.com/productsOfTheYearWinner/0,296407,sid5_gci1036103_tax301066_ayr2004,00.html $7 per GB], an 8TB shelf (minimum increment available) would cost on the order of $50,000.
 
==Implemented solutions==
Purchased an internal 750GB hard drive (primary storage) plus an external 750GB hard drive (backup storage) for Jason's machine on 2007-07-11
 
==Reference==


===[http://web.mit.edu/ist/topics/windows/server/winmitedu/ The win.mit.edu Domain]===
===[http://web.mit.edu/ist/topics/windows/server/winmitedu/ The win.mit.edu Domain]===
Line 61: Line 89:
**an approximate figure because it includes both "active" and "inactive" files but this is offset by data compression
**an approximate figure because it includes both "active" and "inactive" files but this is offset by data compression
*[http://itinfo.mit.edu/product.php?id=7&level=all TSM software] is required to use the service and is available for Windows, Mac and Linux (free to MIT community per site license)
*[http://itinfo.mit.edu/product.php?id=7&level=all TSM software] is required to use the service and is available for Windows, Mac and Linux (free to MIT community per site license)
*Backups are stored on one of the TSM backup servers in buildings W91 and E40 (not redundant)
*Backups are stored on one of the TSM backup servers in buildings W91 and E40 (no mirroring)
*Types of backup:
*Types of backup:
**Scheduled: everything by default but can be configured to exclude directories
**Scheduled: everything by default but can be configured to exclude directories
Line 70: Line 98:
*5,000 users, 250,000 files restored per quarter
*5,000 users, 250,000 files restored per quarter
*128-bit [http://itinfo.mit.edu/article.php?id=7444 encryption] available
*128-bit [http://itinfo.mit.edu/article.php?id=7444 encryption] available
*coming soon (summer of 2007 at the earliest):
'''from January 2008 the service will consist of three levels''':
**free service (for personal use): 10-20GB
*Basic
**enhanced service (for DLC use): 1TB and up, offsite mirroring, will be expensive, etc
**targeted at desktops and laptops
**15GB of data in selected directories/folders at no cost.
**Fee: NO COST
*Standard
**based on current service
**300GB of data
**Fee:  $15/month (effective January 2008) - this represents an increase of $7.50/month from the current fee of $7.50/month
*Enterprise
**designed for server-class machines
**10TB (Terabytes) of data
**Fee: $65/month
**Backup data protected in a redundant environment, including tape duplication and an option to send data offsite
**Dedicated hardware for increased backup and restore performance


====Misc====
====Misc====
Line 111: Line 151:
*[http://www.anandtech.com/storage/showdoc.aspx?i=2723 ReadyNAS NV] - AnandTech review
*[http://www.anandtech.com/storage/showdoc.aspx?i=2723 ReadyNAS NV] - AnandTech review
*[http://www.practicallynetworked.com/review.asp?pid=634 ReadyNAS NV] - PracticallyNetworked.com review
*[http://www.practicallynetworked.com/review.asp?pid=634 ReadyNAS NV] - PracticallyNetworked.com review
====[http://www.lacie.com/us/products/range.htm?id=10007 LaCie NAS]====
*[http://www.lacie.com/us/products/product.htm?pid=10645 LaCie Ethernet Disk]
**1TB $740, 2TB $1,050 (June 2007)
**Rack format
**Powered by Windows XP® Embedded
**Gigabit Ethernet
*[http://www.lacie.com/us/products/product.htm?pid=10876 LaCie Ethernet Disk RAID]
**1TB $840, 2TB $1,160, 4TB $3,000 (June 2007)
**4 removable, hot-swappable drives ([http://www.lacie.com/us/products/product.htm?pid=10878 spare drives]: 250GB $210, 500GB $390 - June 2007)
**Gigabit Ethernet

Latest revision as of 13:54, 5 December 2007

Problem summary

We need an easy, secure and efficient way to store all our files:

  • individual user files (backup)
  • microscope images - processed and unprocessed (storage and backup)
  • shared project files (centralized/shared storage and backup)
  • old user and project files (centralized/shared storage and backup)

Ideal functionality

(lab and individual data storage, sharing and backup needs - please list what would you like have available)

  • Capacity: we want to be able to store all files in a single location
  • Easy: automatic backup
  • Secure: the backup system shouldn't be located in building 68, in case of a fire
  • Efficient backing up or retrieving files should be speedy
  • Affordable

Types of data

  • Individual user data and project data (except microscope images)
    • Current data (~100GB?) can be stored on Bionet (backed up automatically)
    • Old data (~100GB?) can be archived on one of the existing lab servers (e.g., shmoo which may need an additional hard drive) but needs backup space.
    • ~200GB total plus backup space
  • Microscope data (processed and unprocessed?)
    • Current and old data (~170GB) can be stored the Mac G5 attached to the scope (may need an additional hard drive)
    • New data (6 months from June 2007: Samantha ~100GB, Jason ~500GB, Francois <100GB) needs both additional primary storage
    • ~900GB total plus backup space

Available resources

Bionet

  • Fast, reliable, automatically backed up but fairly small storage space
  • Consists of two fiber channel disk filers (bionet and bionet2 located on the 3rd floor in building 68) that have about 3TB of usable storage shared among several labs. Data on this system is mirrored to NearStore R200 in building NE47 (as of 2007-06-04).
  • Lab storage space (as of 2007-06-19):
    • total: 200GB
    • available: 88GB
  • Problems with bionet:
    • Not enough space to backup microscope images
    • As of 2007-06-04, bionet and bionet2 are out of the support contract; R200 is under a support contract paid for by the Biology Department. It means that if bionet filers fail, there should be a copy of the data stored on R200 in building NE47.

R200

  • Has 16TB of total storage.
  • Lab storage space (as of 2007-12-05):
    • total: 475GB
    • available: 475GB
  • Problems with r200:
    • Not enough space to backup all microscope images
    • Not backed up anywhere: no off-site backup and no snapshots (only single copy of data exists). It means that if you are using this space, you should have a backup of this data stored elsewhere.

Potential solutions

One server

  • Primary storage:
    • current user and project data on lab desktops and Bionet
    • microscope data on the scope G5 with an extra hard drive
    • old user and project data on shmoo with an extra hard drive
  • Backup storage: Bionet mirror in NE47 and a NAS box with 1.5TB of usable storage; cost: ~$1,500. Biosupport would provide maintenance for free.

Two servers

  • Primary storage:
    • current individual and project data on lab desktops and Bionet
    • microscope and old data on a new storage server with a RAID array (e.g., 4 x 500GB drives in RAID 5 configuration would provide about 1.5TB of usable storage space). Cost: ~$2,000. Host it in the BioMicro Center on the 3rd floor of building 68. Biosupport would provide maintenance for free.
  • Backup storage: second identical server (~$2,000) or a NAS box (~$1,500) and host it in Tech Square (NE47). Biosupport would provide maintenance for free.

Temporary

  • Primary storage:
    • current user and project data on lab desktops and Bionet
    • microscope data on the scope G5 with an extra hard drive
    • old user and project data on shmoo with an extra hard drive
  • Backup storage: MIT TSM for backing up microscope data from the scope G5 ("soft limit" of 300GB per machine as of June 2007). Cost: unknown monthly fee (currently $7.50/month for 300GB).

Tape drive

Shelf for R200

At $7 per GB, an 8TB shelf (minimum increment available) would cost on the order of $50,000.

Implemented solutions

Purchased an internal 750GB hard drive (primary storage) plus an external 750GB hard drive (backup storage) for Jason's machine on 2007-07-11

Reference

The win.mit.edu Domain

MIT TSM Backup Service

  • Monthly service charge: $7.50 per month per computer
  • Storage limit: 300GB
    • a soft limit, some users go over
    • an approximate figure because it includes both "active" and "inactive" files but this is offset by data compression
  • TSM software is required to use the service and is available for Windows, Mac and Linux (free to MIT community per site license)
  • Backups are stored on one of the TSM backup servers in buildings W91 and E40 (no mirroring)
  • Types of backup:
    • Scheduled: everything by default but can be configured to exclude directories
    • Manual: nothing by default, need to specify which directories to backup
  • Inactive files (old versions of current files and deleted files) are kept for 30 days using incremental storage (only changes are stored)
  • Need a separate account for each computer to be backed up
  • Performance will vary, depending on time of the day, network condition and machine itself)
  • 5,000 users, 250,000 files restored per quarter
  • 128-bit encryption available

from January 2008 the service will consist of three levels:

  • Basic
    • targeted at desktops and laptops
    • 15GB of data in selected directories/folders at no cost.
    • Fee: NO COST
  • Standard
    • based on current service
    • 300GB of data
    • Fee: $15/month (effective January 2008) - this represents an increase of $7.50/month from the current fee of $7.50/month
  • Enterprise
    • designed for server-class machines
    • 10TB (Terabytes) of data
    • Fee: $65/month
    • Backup data protected in a redundant environment, including tape duplication and an option to send data offsite
    • Dedicated hardware for increased backup and restore performance

Misc

Network Attached Storage

A Tale of Two Terabyte NAS Boxes

Buffalo Technology

  • Buffalo TeraStation Home
    • Example disk configuration: 4 x 250GB IDE (750GB in RAID5)
    • Protocols: FTP, SMB
    • USB 2.0 port for external hard drive (backup or additional storage)
    • Review by PC Magazine
      • Bottom line: Flexible and reliable storage for everyone on your network. Print sharing is a plus, as is expandable USB disk storage.
      • Pros: Offers RAID level data protection; easy-to-configure shared and private storage for all workgroup members; print sharing is a plus.
      • Cons: Large footprint. No logging or reporting features.
    • Review by ExtremeTech
    • TeraStation wiki
  • Buffalo TeraStation Pro
    • Released in March 2006
    • S-ATA drives

Infrant ReadyNAS

  • ReadyNAS NV
  • ReadyNAS NV+
  • Infrant ReadyNAS NV+ and 1100: Small steps forward - review
    • comes with a 5+5-user license for EMC's Retrospect for Windows and Macintosh client backup software
    • The NV+ is a slight improvement over the NV, with most of the value coming in the Retrospect backup client bundle
    • Since both the NV and NV+ use the same processor and have the same memory, the performance difference I saw is more due to better drives in the NV+ and newer firmware than anything else
    • with the lowest price at time of review at $831 for a driveless NV+ and $517 for an NV, you might be better served by using the $300 to buy drives
  • Infrant ReadyNAS NV Review
    • X-RAID (Expandable RAID) allows to add capacity without deleting existing data, automatically adjusts RAID level and formatted capacity to match the available drives
  • ReadyNAS NV - user review
  • ReadyNAS NV - AnandTech review
  • ReadyNAS NV - PracticallyNetworked.com review

LaCie NAS

  • LaCie Ethernet Disk
    • 1TB $740, 2TB $1,050 (June 2007)
    • Rack format
    • Powered by Windows XP® Embedded
    • Gigabit Ethernet
  • LaCie Ethernet Disk RAID
    • 1TB $840, 2TB $1,160, 4TB $3,000 (June 2007)
    • 4 removable, hot-swappable drives (spare drives: 250GB $210, 500GB $390 - June 2007)
    • Gigabit Ethernet