User:Jarle Pahr/Bioinformatics

From OpenWetWare

< User:Jarle Pahr(Difference between revisions)
Jump to: navigation, search
(Software)
Current revision (14:44, 28 September 2013) (view source)
 
(69 intermediate revisions not shown.)
Line 1: Line 1:
Links and notes regarding bioinformatics.
Links and notes regarding bioinformatics.
 +
https://beta.stepic.org/Bioinformatics-Algorithms-2/
-
'''Learning resources:''''
+
https://class.coursera.org/bioinformatics-001/class/index
 +
 
 +
http://etool.me/
 +
 
 +
http://bioinformaticsonline.com/
 +
 
 +
http://ihg.gsf.de/ihg/databases.html
 +
 
 +
http://gtpb.igc.gulbenkian.pt/bicourses/index.html
 +
 
 +
http://nxseq.bitesizebio.com/articles/bioinformatics-for-ngs-open-source-or-proprietary/
 +
 
 +
The elements of bioinformatics: http://elements.eaglegenomics.com/
 +
 
 +
=Bioinformatics for newbies=
 +
 
 +
http://www.homolog.us/blogs/blog/2012/10/24/simple-examples-to-learn-bioinformatics-programming/
 +
 
 +
http://www.homolog.us/blogs/blog/2011/09/19/must-have-tools-for-a-bioinformatician/
 +
 
 +
=Links=
 +
 
 +
http://bioinformatics.ca/links_directory/
 +
 
 +
http://www.vls3d.com/links.html#
 +
 
 +
http://bioinformaticssoftwareandtools.co.in/
 +
 
 +
http://www.hsls.pitt.edu/obrc/
 +
 
 +
=Tips and advice=
 +
 
 +
http://www.biostars.org/p/75925/
 +
 
 +
http://pathogenomics.bham.ac.uk/blog/2013/07/i-want-to-learn-bioinformatics-a-guide-for-complete-beginners/
 +
 
 +
http://bioinfoblog.it/2013/09/my-attempt-at-following-every-possible-best-practice-in-bioinformatics/
 +
 
 +
=Discussion=
 +
 
 +
http://www.protocol-online.org/forums/forum/55-bioinformatics-and-biostatistics/
 +
 
 +
=Organizations=
 +
 
 +
http://www.embnet.org/
 +
 
 +
GOBLET: http://www.mygoblet.org/
 +
 
 +
 
 +
http://www.biotnet.org/
 +
 
 +
=Education and training=
 +
 
 +
CUBELP - Cranfield University Bioinformatics Electronic Learning Platform : http://elvis.ccc.cranfield.ac.uk/CUBELP2/
 +
 
 +
4273π: Bioinformatics education on low cost ARM hardware: http://www.biomedcentral.com/1471-2105/14/243
 +
 
 +
http://eggg.st-andrews.ac.uk/4273pi/
 +
 
 +
http://paper.li/f-1334858808
 +
 
 +
An online bioinformatics curriculum: http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1002632
 +
 
 +
Best practices for bioinformatics training: http://bib.oxfordjournals.org/content/early/2013/06/25/bib.bbt043.abstract
 +
 
 +
Titus Brown's list of bioinformatics courses: http://ged.msu.edu/angus/bioinformatics-courses.html
 +
 
 +
UC Davis Bioinformatics Training Program: training.bioinformatics.ucdavis.edu
 +
 
 +
UC Riverside Bioinformatics Manuals: manuals.bioinformatics.ucr.edu
 +
 
 +
Bioconductor Course Materials: bioconductor.org/help/course-materials
 +
 
 +
http://www.genomeweb.com/informatics/genomeweb-feature-many-options-formal-and-informal-those-seeking-bioinformatics?utm_source=twitterfeed&utm_medium=twitter&utm_campaign=Feed%3A+genomeweb%2Fgenomeweb-daily-news+%28GenomeWeb+Daily+News%29
 +
 
 +
Bioinformatic Training links by Stephen Turner: http://stephenturner.us/p/edu
 +
 
 +
Conference on Bioinformatics education: http://bioinf.spbau.ru/be2012/
 +
 
 +
http://bioinformaticssoftwareandtools.co.in/
 +
 
 +
Bioplanet Bioinformatics FAQ: http://www.bioplanet.com/bioinformatics-faq/#/vanilla/discussion/embed/?vanilla_discussion_id=0
 +
 
 +
http://www.bioplanet.com/bioinformatics-tutorials/#/vanilla/discussion/embed/?vanilla_discussion_id=0
 +
 
 +
http://www.bioinformatics.org/edu/course/
 +
 
 +
EMBER: http://www.bioinf.man.ac.uk/dbbrowser/ember/PDF/CALreport.pdf
 +
 
 +
http://www.ember.man.ac.uk/intro.php
Open Bioinformatics foundation: http://www.open-bio.org/wiki/News
Open Bioinformatics foundation: http://www.open-bio.org/wiki/News
Line 8: Line 98:
BTI Plant bioinformatics course: http://btiplantbioinfocourse.wordpress.com/2012-course/core-program/
BTI Plant bioinformatics course: http://btiplantbioinfocourse.wordpress.com/2012-course/core-program/
 +
http://thinking.bioinformatics.ucla.edu/teaching/
http://www.molecularevolution.org/resources
http://www.molecularevolution.org/resources
http://www.bio.brandeis.edu/InterpGenes/Project/menu.htm
http://www.bio.brandeis.edu/InterpGenes/Project/menu.htm
 +
 +
http://openwetware.org/wiki/Wikiomics:Bioinfo_tutorial
Bioinformatics tools:
Bioinformatics tools:
Line 42: Line 135:
[http://en.wikipedia.org/wiki/Expression_quantitative_trait_loci Expression quantitive trait loci (eQTLs)]
[http://en.wikipedia.org/wiki/Expression_quantitative_trait_loci Expression quantitive trait loci (eQTLs)]
 +
=Gmod=
 +
Generic Model Organism Database project.
 +
 +
http://www.gmod.org/wiki/Main_Page
 +
 +
 +
=Ontologies=
 +
 +
Sequence otology: http://www.sequenceontology.org/
 +
 +
Gene ontology: http://www.geneontology.org/
=Sequence alignment=
=Sequence alignment=
Line 112: Line 216:
GeneMark: http://exon.gatech.edu/
GeneMark: http://exon.gatech.edu/
 +
 +
=Transcription factor binding prediction=
 +
 +
http://alggen.lsi.upc.es/cgi-bin/promo_v3/promo/promoinit.cgi?dirDB=TF_8.3
=Promoter prediction=
=Promoter prediction=
Line 142: Line 250:
In the context of profile HMMs, the null model mrepresents sequences which are not related to the profile sequences [Zvelebil & Baum]. The choice of null model will affect the results gained by using a profile HMM.
In the context of profile HMMs, the null model mrepresents sequences which are not related to the profile sequences [Zvelebil & Baum]. The choice of null model will affect the results gained by using a profile HMM.
 +
 +
Software:
 +
 +
https://simtk.org/home/emma
=Annotation=
=Annotation=
Line 159: Line 271:
=Software=
=Software=
 +
http://www.ebi.ac.uk/services
 +
 +
https://github.com/BioinformaticsArchive/
FASTA package: http://fasta.bioch.virginia.edu/fasta_www2/fasta_list2.shtml
FASTA package: http://fasta.bioch.virginia.edu/fasta_www2/fasta_list2.shtml
Line 164: Line 279:
http://www.petercollingridge.co.uk/python-bioinformatics-tools/
http://www.petercollingridge.co.uk/python-bioinformatics-tools/
-
 
+
http://biogenie.sourceforge.net/
BWA:
BWA:
Line 177: Line 292:
T-Coffee
T-Coffee
 +
 +
Khmer: http://khmer.readthedocs.org/en/latest/#
 +
 +
 +
Bioinformatics software on SourceForge: http://sourceforge.net/directory/science-engineering/bioinformatics/os:windows/freshness:recently-updated/
 +
 +
=Courses and conferences=
 +
 +
http://ged.msu.edu/angus/bioinformatics-courses.html
 +
 +
http://www.bioplanet.com/bioinformatics-courses/
 +
 +
 +
http://octette.cs.man.ac.uk/bioinformatics/applications/index.html
 +
 +
http://stephenturner.us/p/edu
 +
 +
 +
 +
Bioinformatics Open Source Conference (BOSC):
 +
 +
Codefest: http://www.open-bio.org/wiki/Codefest_2013
=Databases=
=Databases=
Line 205: Line 342:
=Bibliography=
=Bibliography=
 +
 +
 +
http://www.retrovirology.com/content/5/1/110
 +
 +
http://arxiv.org/abs/1010.1092
 +
 +
 +
Books: https://www.facebook.com/media/set/?set=a.32322088334.42378.28854648334&type=3
 +
 +
Large-scale compression of genomic sequence databases with the Burrows-Wheeler transform.: http://www.ncbi.nlm.nih.gov/pubmed/22556365
 +
 +
=Blogs & Websites=
 +
 +
 +
'''Websites:'''
 +
 +
http://www.biocodershub.net/community/
 +
 +
http://www.bioplanet.com/
 +
 +
'''Blogs:'''
 +
 +
http://biogeeks.wordpress.com/
 +
 +
http://www.bioinformaticszen.com/
 +
 +
http://thinking.bioinformatics.ucla.edu/
 +
 +
http://www.davelunt.net/evophylo/
 +
 +
http://www.bioinformaticszen.com/
 +
 +
http://www.homolog.us/blogs/
 +
 +
See also http://www.homolog.us/blogs/blog/2012/07/27/how-to-stay-current-in-bioinformaticsgenomics/
 +
 +
http://kevin-gattaca.blogspot.no/
 +
 +
http://manuelcorpas.com/
 +
 +
'''Discussion sites:'''
 +
 +
http://www.reddit.com/r/bioinformatics
 +
 +
=Bioinformatics in Norway=
 +
 +
http://www.bioinfo.no/
 +
 +
 +
http://www.bioportal.uio.no/
 +
 +
=Journals=
 +
 +
Briefings in Bioinformatics: http://bib.oxfordjournals.org/
 +
 +
Bioinformatics:
 +
 +
See also http://openwetware.org/wiki/Abhishek_Tiwari:Hot_Computational_Biology_Papers-By_Category
 +
 +
Journal of Computational Biology: http://www.liebertpub.com/cmb
 +
 +
 +
http://bioinformaticsonline.com/pages/view/938/list-of-bioinformatics-and-computational-biology-journals

Current revision

Links and notes regarding bioinformatics.

https://beta.stepic.org/Bioinformatics-Algorithms-2/

https://class.coursera.org/bioinformatics-001/class/index

http://etool.me/

http://bioinformaticsonline.com/

http://ihg.gsf.de/ihg/databases.html

http://gtpb.igc.gulbenkian.pt/bicourses/index.html

http://nxseq.bitesizebio.com/articles/bioinformatics-for-ngs-open-source-or-proprietary/

The elements of bioinformatics: http://elements.eaglegenomics.com/

Contents

Bioinformatics for newbies

http://www.homolog.us/blogs/blog/2012/10/24/simple-examples-to-learn-bioinformatics-programming/

http://www.homolog.us/blogs/blog/2011/09/19/must-have-tools-for-a-bioinformatician/

Links

http://bioinformatics.ca/links_directory/

http://www.vls3d.com/links.html#

http://bioinformaticssoftwareandtools.co.in/

http://www.hsls.pitt.edu/obrc/

Tips and advice

http://www.biostars.org/p/75925/

http://pathogenomics.bham.ac.uk/blog/2013/07/i-want-to-learn-bioinformatics-a-guide-for-complete-beginners/

http://bioinfoblog.it/2013/09/my-attempt-at-following-every-possible-best-practice-in-bioinformatics/

Discussion

http://www.protocol-online.org/forums/forum/55-bioinformatics-and-biostatistics/

Organizations

http://www.embnet.org/

GOBLET: http://www.mygoblet.org/


http://www.biotnet.org/

Education and training

CUBELP - Cranfield University Bioinformatics Electronic Learning Platform : http://elvis.ccc.cranfield.ac.uk/CUBELP2/

4273π: Bioinformatics education on low cost ARM hardware: http://www.biomedcentral.com/1471-2105/14/243

http://eggg.st-andrews.ac.uk/4273pi/

http://paper.li/f-1334858808

An online bioinformatics curriculum: http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1002632

Best practices for bioinformatics training: http://bib.oxfordjournals.org/content/early/2013/06/25/bib.bbt043.abstract

Titus Brown's list of bioinformatics courses: http://ged.msu.edu/angus/bioinformatics-courses.html

UC Davis Bioinformatics Training Program: training.bioinformatics.ucdavis.edu

UC Riverside Bioinformatics Manuals: manuals.bioinformatics.ucr.edu

Bioconductor Course Materials: bioconductor.org/help/course-materials

http://www.genomeweb.com/informatics/genomeweb-feature-many-options-formal-and-informal-those-seeking-bioinformatics?utm_source=twitterfeed&utm_medium=twitter&utm_campaign=Feed%3A+genomeweb%2Fgenomeweb-daily-news+%28GenomeWeb+Daily+News%29

Bioinformatic Training links by Stephen Turner: http://stephenturner.us/p/edu

Conference on Bioinformatics education: http://bioinf.spbau.ru/be2012/

http://bioinformaticssoftwareandtools.co.in/

Bioplanet Bioinformatics FAQ: http://www.bioplanet.com/bioinformatics-faq/#/vanilla/discussion/embed/?vanilla_discussion_id=0

http://www.bioplanet.com/bioinformatics-tutorials/#/vanilla/discussion/embed/?vanilla_discussion_id=0

http://www.bioinformatics.org/edu/course/

EMBER: http://www.bioinf.man.ac.uk/dbbrowser/ember/PDF/CALreport.pdf

http://www.ember.man.ac.uk/intro.php

Open Bioinformatics foundation: http://www.open-bio.org/wiki/News

BTI Plant bioinformatics course: http://btiplantbioinfocourse.wordpress.com/2012-course/core-program/

http://thinking.bioinformatics.ucla.edu/teaching/

http://www.molecularevolution.org/resources

http://www.bio.brandeis.edu/InterpGenes/Project/menu.htm

http://openwetware.org/wiki/Wikiomics:Bioinfo_tutorial

Bioinformatics tools:


http://bioconductor.org/

http://ipython.org/notebook.html


DNA stability and secondary structure prediction:


RNAfold: http://rna.tbi.univie.ac.at/cgi-bin/RNAfold.cgi

DINAmelt - quickfold: http://mfold.rna.albany.edu/?q=DINAMelt/Quickfold

Mfold: http://mfold.rna.albany.edu/?q=mfold



Expressed Sequence Tags (EST):


Serial Analysis of Gene Expression (SAGE):


Expression quantitive trait loci (eQTLs)

Gmod

Generic Model Organism Database project.

http://www.gmod.org/wiki/Main_Page


Ontologies

Sequence otology: http://www.sequenceontology.org/

Gene ontology: http://www.geneontology.org/

Sequence alignment

Scoring matrices

See also http://en.wikipedia.org/wiki/Substitution_matrix

PAM (MDM) substitution matrices:

Point Accepted Mutations (PAM) matrix / Mutation Data Matrix (MDM) matrices were developed by Margaret Dayhoff et al. from analysis of multiple alignments within protein families.

A mutation probability matrix, M, is defined where each element M(a,b) gives the probability that a residue of type b will have been replaced by one of type a after a given amount of evolutionary time [Zvelebil & Baum].

The unit PAM (Point Accepted Mutations) measures the number of retained mutations in a sequence.

The PAM matrix number thus indicates evolutionary distance. PAM250 indicates 250 Point Accepted Mutations per 100 residues (an average of more than one mutation per residue, indicating that many bases have changed more than once). 250 PAM is at the limit of detection of evolutionary relationships [Zvelebil & Baum]. A PAM250 matrix is obtained by raising the PAM-1 matrix to the 250th power [Zvelebil & Baum]. This is based on a model of evolution as a Markov process [Zvelebil & Baum].

PAM250: http://rosalind.info/glossary/pam250/

Block Substitution Matrices (BLOSUM):

Developed in the 1990s using local multiple alignments. A set of aligned highly conserved short regions are generated, and clustered into groups according to similarity. Sequences are grouped together if they exceed a specified percentage similarity treshold. Substitution frequencies for all possible pairs of amino acids are then calculated between the clustered groups [Zvelebil & Baum].

The BLOSUM matrices are based on data from the BLOCK database published in 1991. The BLOCKS database contains ungapped multiple local alignments of protein conserved regions. Various BLOSUM matrices can be generated by varying the percentage-cutoff for similarity group clustering. [Zvelebil & Baum] The BLOSUM-62 matrix was generated by using a treshold of 62 % identity. For the sequences used to produce the original Dayhoff PAM matrices, the treshold giving a single cluster is 85 %, indicating that those sequences were more similiar.


BLOSUM62 http://rosalind.info/glossary/blosum62/

http://en.wikipedia.org/wiki/BLOSUM

Article: S Henikoff and J G Henikoff, 1992. Amino acid substitution matrices from protein blocks: http://www.pnas.org/content/89/22/10915.abstract

Gap scoring

Linear gap penalty:

The simplest method for scoring gaps is to assign a penalty g for every residue aligned to a gap.

g = - E (E a positive number)

g(n_gap) = -n_gap * E

To better account for the observed pattern of fewer, longer gaps, a combination of a high gap opening penalty and a lower gap extension penalty can be used:

Gap opening penalty (GOP): The gap opening penalty, designated I, is the score penalty (amount score reduction) which is associated with introducing a gap in the alignment.

Gap extension penalty (GEP): The GEP, designated E, is the score penalty for each base aligned to a gap after the initial base. (That is, a GEP is not assigned for a single-residue gap).

Using the combination of a gap opening penalty and gap extension penalty gives the affine gap penalty formula:

g(n_gap) = -I -/(n_gap - 1)E

Typical values for I and E in protein alignment applications are 7-15 and 0.5-2, respectively [Zvelebil & Blaum].

Log odds ratios

A log odds value is the logarithm of an odds ratio.


See also http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2904766/

See also http://www.biostars.org/p/14855/

http://www.bio.brandeis.edu/InterpGenes/Project/align16.htm

Gene prediction

GLIMMER: http://www.cbcb.umd.edu/software/glimmer/

GeneMark: http://exon.gatech.edu/

Transcription factor binding prediction

http://alggen.lsi.upc.es/cgi-bin/promo_v3/promo/promoinit.cgi?dirDB=TF_8.3

Promoter prediction

Bprom: http://linux1.softberry.com/berry.phtml?topic=bprom&group=programs&subgroup=gfindb

Protein structure prediction

PHYRE2 protein fold recognition server: http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index

Fold recognition (threading): Protein fold recognition (threading) is a method for modelling proteins which have one or more folds in common with proteins with known structure. Threading is distinct from homology modelling. There is not a clear boundary, as both threading and homology modelling are template-based methods. Homology modelling can be used when the structure of a protein homologous to the modelling target is known, while threading is used if only protein structures with fold-level similarity are known.

Structure predictions are made by "threading" (aligning) each amino acid in the target sequence to one of several templates and evaluating the fit of each template. The structure model is then based on the alignment with the best-fitting template.

See also http://en.wikipedia.org/wiki/Threading_%28protein_sequence%29

PHYRE/PHYRE2 - threading server: http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index

Hidden Markov Models

http://en.wikipedia.org/wiki/Hidden_Markov_model

A Hidden Markov Model (HMM) is a probabilistic method than can be used to analyze biological sequences and other sequential data [Zvelebil & Baum].


Profile HMMs: A profile HMM represents the common features of a set of sequences and is used to perform alignments of further sequences to that set [Zvelebil & Baum}.


Null model:

In the context of profile HMMs, the null model mrepresents sequences which are not related to the profile sequences [Zvelebil & Baum]. The choice of null model will affect the results gained by using a profile HMM.


Software:

https://simtk.org/home/emma

Annotation

BASYS bacterial annotation system: http://basys.ca/

Challenges

Critical Assessment of Genome Interpretation (CAGI): https://genomeinterpretation.org/

Critical assessment of methods of protein structure prediction (CASP):

http://predictioncenter.org/ http://www.ncbi.nlm.nih.gov/pubmed/14579322

Software

http://www.ebi.ac.uk/services

https://github.com/BioinformaticsArchive/

FASTA package: http://fasta.bioch.virginia.edu/fasta_www2/fasta_list2.shtml

http://www.petercollingridge.co.uk/python-bioinformatics-tools/

http://biogenie.sourceforge.net/

BWA:

Bowtie:

Tophat:

Cufflinks, RNAseq analysis tool: http://cufflinks.cbcb.umd.edu/manual.html

SOAP:

T-Coffee

Khmer: http://khmer.readthedocs.org/en/latest/#


Bioinformatics software on SourceForge: http://sourceforge.net/directory/science-engineering/bioinformatics/os:windows/freshness:recently-updated/

Courses and conferences

http://ged.msu.edu/angus/bioinformatics-courses.html

http://www.bioplanet.com/bioinformatics-courses/


http://octette.cs.man.ac.uk/bioinformatics/applications/index.html

http://stephenturner.us/p/edu


Bioinformatics Open Source Conference (BOSC):

Codefest: http://www.open-bio.org/wiki/Codefest_2013

Databases

Sequence databases:


NCBI Sequence Read Archive (SRA): http://www.ncbi.nlm.nih.gov/sra Stores raw sequencing data from high-throughput sequencing.


Structural databases:

http://www.cathdb.info/

http://www.rcsb.org/pdb/home/home.do


http://www.brenda-enzymes.info/

File formats

NCBI file format guide: http://www.ncbi.nlm.nih.gov/books/NBK47537/


SRA


Bibliography

http://www.retrovirology.com/content/5/1/110

http://arxiv.org/abs/1010.1092


Books: https://www.facebook.com/media/set/?set=a.32322088334.42378.28854648334&type=3

Large-scale compression of genomic sequence databases with the Burrows-Wheeler transform.: http://www.ncbi.nlm.nih.gov/pubmed/22556365

Blogs & Websites

Websites:

http://www.biocodershub.net/community/

http://www.bioplanet.com/

Blogs:

http://biogeeks.wordpress.com/

http://www.bioinformaticszen.com/

http://thinking.bioinformatics.ucla.edu/

http://www.davelunt.net/evophylo/

http://www.bioinformaticszen.com/

http://www.homolog.us/blogs/

See also http://www.homolog.us/blogs/blog/2012/07/27/how-to-stay-current-in-bioinformaticsgenomics/

http://kevin-gattaca.blogspot.no/

http://manuelcorpas.com/

Discussion sites:

http://www.reddit.com/r/bioinformatics

Bioinformatics in Norway

http://www.bioinfo.no/


http://www.bioportal.uio.no/

Journals

Briefings in Bioinformatics: http://bib.oxfordjournals.org/

Bioinformatics:

See also http://openwetware.org/wiki/Abhishek_Tiwari:Hot_Computational_Biology_Papers-By_Category

Journal of Computational Biology: http://www.liebertpub.com/cmb


http://bioinformaticsonline.com/pages/view/938/list-of-bioinformatics-and-computational-biology-journals

Personal tools