User:Janet B. Matsen: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
No edit summary
No edit summary
(34 intermediate revisions by the same user not shown)
Line 19: Line 19:
|}
|}


I am a 5th year Chemical Engineering PhD student at the University of Washington working in Mary Lidstrom's LabWe are engineering E. coli to make biofuel precursors from electricity and CO<sub>2</sub> using a metabolic pathway that doesn't exist in nature.  Success would enable production of biofuel from renewable electricity.  Getting the pathway to work in living cells has been challengingWe are combining metabolic engineering, synthetic biology, metabolomics, enzyme engineering, and directed evolution in E. coli and a novel methylotroph to achieve this goal.  
I am a Chemical Engineering PhD candidate graduating in Winter 2016My major project has been implementation of a novel carbon-fixation pathway, which included a computationally designed enzyme and three enzyme reactions not found in nature.  For my last year I have transitioned to fully computational work.  My new project involves metagenomics and metatranscriptomics of a methane oxidizing community using collaborative programming, remote \& cloud computing, machine learning, and visualization of large multivariate data sets.  In addition, I will graduate with an Advanced Data Science certificate for coursework in statistics, machine learning, data management, and data visualization.


My first year was spent investigating methanotrophic metabolism in pure cultures and a model ecosystem in a team that combined transcriptomics, metabolomics, and single-cell observation.  
Please see my [http://openwetware.org/images/e/e5/2015-Matsen-resume.pdf resume] or [https://www.linkedin.com/in/janetmatsen LinkedIn] for more professional information and [https://github.com/JanetMatsen GitHub] for some of my code.
 
I work with [http://depts.washington.edu/mllab/ Mary Lidstrom], [http://www.bakerlab.org/ David Baker], and [http://faculty.washington.edu/dacb/ David Beck].
   
   
[[Image:Janet_Matsen.png]]
[[Image:Janet_Matsen.png]]


I started the [[Lidstrom|Lidstrom Lab OWW wiki]] and love posting what I learn!  It has been a lot of fun to record what I have learned about lab techniques, and my pages are viewed by many scientists outside the labThis wiki is also a fun place me to share tips/tricks and results of experiments that probe dogma in experimental techniques.  
== OpenWetWare Contributions ==
 
I started the [[Lidstrom|Lidstrom Lab OWW wiki]] in 2011 and love posting what I learn!   
Wet lab biology is full of dogmas that I enjoy challenging.
When I learn that dogmas are false, or the importance of particular variables in methods, I post in [[Lidstrom:Protocols|The Lidstrom lab's wiki]].
I have over 27,000 contributions over dozens of pages.
Two popular ones are my [[Janet_B._Matsen:Guide_to_Gibson_Assembly|Guide to Gibson Assembly], and [[Lidstrom:_SDS-PAGE|SDS-PAGE]].   
 
In addition, I posted some protocols specific to my PhD work to a GitHub driven [http://janetmatsen.github.io/protocols/ web page] for use in working with [https://www.linkedin.com/in/helen-chan-a2665891 Helen Chan], a fabulous Chemical Engineering undergraduate .


== Research Interests ==
== Research Interests ==
<blockquote>  
<blockquote>  
* production of chemicals using microbes
* production of chemicals using microbes
*metabolic engineering
* protein engineering
*synthetic biology
* metabolic engineering
*transcriptomics  
* synthetic biology
*chemical engineering
* transcriptomics  
* chemical engineering applied to biology
</blockquote>
</blockquote>


Line 68: Line 79:
== My Personal Pages ==
== My Personal Pages ==
<blockquote>
<blockquote>
* [https://www.linkedin.com/in/janetmatsen Linkedin]
* [https://github.com/JanetMatsen GitHub]
* [http://janetmatsen.github.io/protocols/ Protocols] specific to my project, as of 3/2015.  Hosted on GitHub.
*[[Janet B. Matsen:Guide to Gibson Assembly|Guide to Gibson Assembly]]  
*[[Janet B. Matsen:Guide to Gibson Assembly|Guide to Gibson Assembly]]  
*[[Janet B. Matsen:Lab Tips & Tricks|Lab Tips & Tricks]]
*[[Janet B. Matsen:Lab Tips & Tricks|Lab Tips & Tricks]]
*[[Janet B. Matsen:Useful Links|Useful Links]]
*[[Janet B. Matsen:Books I like|Books I like]]
*[[Janet B. Matsen:Thesis Project|Personal Notes for Thesis Project]]
Not maintained any longer:
*[[Janet B. Matsen:Open Lab Questions|Open Lab Questions]]
*[[Janet B. Matsen:Open Lab Questions|Open Lab Questions]]
*[[Janet B. Matsen:Closed Lab Questions|Closed Lab Questions]]
*[[Janet B. Matsen:Closed Lab Questions|Closed Lab Questions]]
*[[Janet B. Matsen:Best Lab Practices|Best Lab Practices]]
*[[Janet B. Matsen:Best Lab Practices|Best Lab Practices]]
*[[Janet B. Matsen:Useful Links|Useful Links]]
 
*[[Janet B. Matsen:Books I like|Books I like]]
*[[Janet B. Matsen:Thesis Project|Personal Notes for Thesis Project]]
<br>
<br>
</blockquote>
</blockquote>
Line 82: Line 99:
=== GitHub Repository ===
=== GitHub Repository ===
* [https://github.com/JanetMatsen Janet on GitHub]
* [https://github.com/JanetMatsen Janet on GitHub]
** I'm beginning to contribute code to the [https://github.com/dacb/lidlab LidLab GitHub repository.  
** All of my plasmid files are found in a [https://github.com/JanetMatsen/Plasmids version controlled repository]
** Protocols for my personal use and collaboration with Helen Chan (Chemical Engineering Undergrad) are here: [http://janetmatsen.github.io/protocols/ GitHub Pages: Janet Matsen].
** I'm beginning to contribute code to the [https://github.com/dacb/lidlab/ LidLab GitHub repository]. My sub-folder is [https://github.com/dacb/lidlab/tree/master/jm here].
*** Favorite function for exporting data from the SpectraMax 190 plate reader: [https://github.com/dacb/lidlab/tree/master/jm/SpectraMax_190_plate_reader_data_importer SpectraMax_190_plate_reader_data_importer]
 
=== APE annotation library generator & list of primers to share with our lab ===  
=== APE annotation library generator & list of primers to share with our lab ===  
* This is the first script I ever wrote, and remains important in my research.  Feel free to download it and enjoy it yourself. 
*[https://www.dropbox.com/s/3p5bnfip4ks8daa/APE_AnnotationFeatureLibraryCreator.R Ape Annotation Feature Library Creator]  
*[https://www.dropbox.com/s/3p5bnfip4ks8daa/APE_AnnotationFeatureLibraryCreator.R Ape Annotation Feature Library Creator]  
** This is an R script that converts the info in [https://docs.google.com/spreadsheet/ccc?key=0AlVxrZi130nMdHlsaml2OGFDUW9zRlVBdkRKaXVEbkE#gid=22 my list of primers] into a file that I can use to annotate DNA files in APE with.  It:  
** This is an R script that converts the info in [https://docs.google.com/spreadsheet/ccc?key=0AlVxrZi130nMdHlsaml2OGFDUW9zRlVBdkRKaXVEbkE#gid=22 my list of primers] into a file that I can use to annotate DNA files in APE with.   
***trims out sequences not intended for sequencing such as Gibson assembly primers
*It:  
***makes a label that combines the unique primer number, the melting temperature, and the letter F or R for forward or reverse, and an asterisk if you should consult the primer spreadsheet comments before using it
** trims out sequences not intended for sequencing such as Gibson assembly primers
***assigns colors in APE that communicate whether it primers in the forward direction or the reverse direction.   
** makes a label that combines the unique primer number, the melting temperature, and the letter F or R for forward or reverse, and an asterisk if you should consult the primer spreadsheet comments before using it
***saves the info in the format APE needs, with the date it was generated in the title.
** assigns colors in APE that communicate whether it primers in the forward direction or the reverse direction.   
** This allows me to instantly see where all of the primers I own bind to a DNA sequence for a given project I am working on.  It also allows me to share these primers very easily; by sharing the file it outputs allows my lab mates to instantly see if I have any primers that can be used in their project.  It has been very handy for them!  
** saves the info in the format APE needs, with the date it was generated in the title.
* This allows me to instantly see where all of the primers I own bind to a DNA sequence for a given project I am working on.  It also allows me to share these primers very easily; by sharing the file it outputs allows my lab mates to instantly see if I have any primers that can be used in their project.  It has been very handy for them!  
** I am happy to help friends modify this script to be useful with their own primer libraries!  No R experience is necessary.
** I am happy to help friends modify this script to be useful with their own primer libraries!  No R experience is necessary.
** Anyone can access my most current primer "Annotation Feature Library" [https://www.dropbox.com/sh/5w53jl3jhbdddvp/iW7cOtZ2Wd here].  You can also see the files used to generate it there.
** Anyone can access my most current primer "Annotation Feature Library" [https://www.dropbox.com/sh/5w53jl3jhbdddvp/iW7cOtZ2Wd here].  You can also see the files used to generate it there.
Line 102: Line 125:
** I designed some primers for a Kan cassette.  The Kan cassette in pCM66 is read in the reverse direction, so all the primers built for a forward Kan cassette appear red.  [[image:2013_05_08_Kan_casette.jpg||thumb|center|Kan primers binding in the opposite direction relative to my database appear red]]
** I designed some primers for a Kan cassette.  The Kan cassette in pCM66 is read in the reverse direction, so all the primers built for a forward Kan cassette appear red.  [[image:2013_05_08_Kan_casette.jpg||thumb|center|Kan primers binding in the opposite direction relative to my database appear red]]


== Skills I'm developing ==
== Skills ==
*molecular biology
* Metabolic engineering, molecular biology, enzyme assays, enzyme evolution, high-throughput screening, metabolomics, Gibson cloning
* enzyme assays
* R & ggplot2, Python, Git/GitHub, LaTeX, Inkscape, Linux
*mass spectrometry based metabolomics
*R & ggplot2
*Inkscape
*Gibson cloning


== Why I love ggplot2 (and R) ==  
== Why I love ggplot2 (and R) ==  
R is a very easy language for people with experience to pick up, and it is one of the easiest for people without experience as well.  It is definitely the best language for noodling with data and doing statistics.   
Data is beautiful.  Interacting and communicating data elegantly makes me happy.
 
R is a relatively easy language to pick up, whether or not you have prior programming experience.  It is one of the best languages for noodling with tabular data and doing statistics, though Python's emulations of R's strengths are growing more appealing.  I use R because I am in love with the ggplot2 plotting package in R.  To get a sense of its power, just [https://www.google.com/search?q=ggplot2&espv=210&es_sm=119&source=lnms&tbm=isch&sa=X&ei=IVYiU-mIG8nFoAS974DwCg&ved=0CAkQ_AUoAQ&biw=1665&bih=929 type "ggplot2" into google images]The book that introduces the fundamentals is [http://www.bioinformaticslaboratory.nl/twikidata/pub/Education/ComputinginR/ggplot2-book.pdf freely available online].   
 
Two cool features of ggplot2:


R was developed at the Hutch, but is a "big deal" worldwideggplot2 is a more recent package that can be used within R.   To get a sense of its power, just [https://www.google.com/search?q=ggplot2&espv=210&es_sm=119&source=lnms&tbm=isch&sa=X&ei=IVYiU-mIG8nFoAS974DwCg&ved=0CAkQ_AUoAQ&biw=1665&bih=929 type "ggplot2" into google images]The book that introduces the fundamentals is [http://www.bioinformaticslaboratory.nl/twikidata/pub/Education/ComputinginR/ggplot2-book.pdf freely available online].     
(1) '''Layers'''.  Imagine you have defined a plot called p in a program.  If you want to add some layer to the plot, you just say p + layerYou can just layer in data, aesthetics, statistics, etc. You can also make one base plot, then make a bunch of variants of it by adding different layers of interestIt is hard to imagine going back once you have this freedom. Layers have different types of geometries you can apply.     


I like to use ggplot2 for two main few reasons:
(2) [https://www.google.com/search?q=ggplot2+facet&espv=210&es_sm=119&source=lnms&tbm=isch&sa=X&ei=gFciU5TkAcjcoASAzIGoBg&ved=0CAoQ_AUoAg&biw=1665&bih=929 '''Facets'''].  Biological data and experimental data are complex!  ggplot2 can help you plot complex data by spatially separating out variables, mapping multiple aesthetics (color, size, shape, outline, etc.) to one point, communicating more information than an excel plot can.
(1) Layers.  Imagine you have defined a plot called p in a program.  If you want to add anything to the plot, you just say p + thing.  You can just layer in data, aesthetics, statistics, etc.  You can also make one base plot, then make a bunch of variants of it by adding different layers of interest.  It is hard to imagine going back once you have this freedom.  Layers have different types of geometries you can apply.   
(2) [https://www.google.com/search?q=ggplot2+facet&espv=210&es_sm=119&source=lnms&tbm=isch&sa=X&ei=gFciU5TkAcjcoASAzIGoBg&ved=0CAoQ_AUoAg&biw=1665&bih=929 Facets].  Biological data is complex! Experimental data is complex too. ggplot2 can help you plot complex data by spatially separating out variables.


Having scripts that I can recycle when doing similar experiments allows me to do in-depth quality checks and make summary statistics that would be impractical to do with Excel.  The quality check plots are automatically generated for each data set.  Then I add plots that are specific to the questions I was investigating with my experiment.  This series of plots paint a story about how the experiment was performed, what variables were important, and what the key findings are.  I can compare elements of different experiments by comparing similar plots that are generated (almost) automatically for each experiment with the (almost) identical scripts.  Here is a [https://www.dropbox.com/sh/08h7uuov7dvhf7o/l-kZLXS-0S sample folder]  sample folder] from an experiment I did recently to get a small sense.
Having scripts that I can recycle when doing similar experiments allows me to do in-depth quality checks and make summary statistics that would be impractical to do with Excel.  The quality check plots are automatically generated for each data set.  Then I add plots that are specific to the questions I was investigating with my experiment.  This series of plots paint a story about how the experiment was performed, what variables were important, and what the key findings are.  I can compare elements of different experiments by comparing similar plots that are generated (almost) automatically for each experiment with the (almost) identical scripts.  Here is a [https://www.dropbox.com/sh/08h7uuov7dvhf7o/l-kZLXS-0S sample folder]  sample folder] from an experiment I did recently to get a small sense.

Revision as of 08:26, 26 November 2015



Janet B. Matsen

Department of Chemical Engineering
Seattle, Washington

jmatsen@uw.edu

I am a Chemical Engineering PhD candidate graduating in Winter 2016. My major project has been implementation of a novel carbon-fixation pathway, which included a computationally designed enzyme and three enzyme reactions not found in nature. For my last year I have transitioned to fully computational work. My new project involves metagenomics and metatranscriptomics of a methane oxidizing community using collaborative programming, remote \& cloud computing, machine learning, and visualization of large multivariate data sets. In addition, I will graduate with an Advanced Data Science certificate for coursework in statistics, machine learning, data management, and data visualization.

Please see my resume or LinkedIn for more professional information and GitHub for some of my code.

I work with Mary Lidstrom, David Baker, and David Beck.

OpenWetWare Contributions

I started the Lidstrom Lab OWW wiki in 2011 and love posting what I learn! Wet lab biology is full of dogmas that I enjoy challenging. When I learn that dogmas are false, or the importance of particular variables in methods, I post in The Lidstrom lab's wiki. I have over 27,000 contributions over dozens of pages. Two popular ones are my [[Janet_B._Matsen:Guide_to_Gibson_Assembly|Guide to Gibson Assembly], and SDS-PAGE.

In addition, I posted some protocols specific to my PhD work to a GitHub driven web page for use in working with Helen Chan, a fabulous Chemical Engineering undergraduate .

Research Interests

  • production of chemicals using microbes
  • protein engineering
  • metabolic engineering
  • synthetic biology
  • transcriptomics
  • chemical engineering applied to biology

Education

PhD (in progress) University of Washington, Seattle

B.S. University of California, Berkeley

  • Chemical Engineering, 2010


Publications

  1. Matsen, Yang, Stein, Beck, & Kalyuzhnaya. Global molecular analyses of methane metabolism in methanotrophic alphaproteobacterium, Methylosinus trichosporium OB3b. Part I: transcriptomic study. Frontiers in Microbiology (open access), 2013
  2. Yang, Matsen, Konopka, Green-Saxena, Clubb, Sadilek, Orphan, Beck, & Kalyuzhnaya. Global molecular analyses of methane metabolism in methanotrophic Alphaproteobacterium, Methylosinus trichosporium OB3b. Part II. metabolomics and 13C-labeling study. Frontiers in Microbiology (open access), 2013

Awards & Activities

  • 2012 honorable mention for the National Science Foundation's Graduate Research Fellowship Program

Outreach:

  • 2011-present Outreach Coordinator for the Puget Sound chapter of the American Institute of Chemical Engineers
    • Leading a mentoring project with 8 chemical engineering mentors and 8 students from the Technology Access Foundation Academy in Kent, WA.
  • 2010-2011 Outreach Coordinator for the University of Washington chapter of the American Chemical Engineering Society
    • Organized two half-day and one all-day events for students from MESA, the Math, Engineering, Science Achievement organization of Washington, involving 60 volunteer- hours and resulting in 660 student-hours of outreach to disadvantaged minority students.
  • Misc. outreach:
    • Gave a presentation to high school students describing statistical challenges associated with transcriptomics research.
    • Hosted a booth at Engineering Discovery Days at University of Washington, engaging and educating the public about chemical engineering.

My Personal Pages

Not maintained any longer:


Tools to Share

GitHub Repository

APE annotation library generator & list of primers to share with our lab

  • This is the first script I ever wrote, and remains important in my research. Feel free to download it and enjoy it yourself.
  • Ape Annotation Feature Library Creator
    • This is an R script that converts the info in my list of primers into a file that I can use to annotate DNA files in APE with.
  • It:
    • trims out sequences not intended for sequencing such as Gibson assembly primers
    • makes a label that combines the unique primer number, the melting temperature, and the letter F or R for forward or reverse, and an asterisk if you should consult the primer spreadsheet comments before using it
    • assigns colors in APE that communicate whether it primers in the forward direction or the reverse direction.
    • saves the info in the format APE needs, with the date it was generated in the title.
  • This allows me to instantly see where all of the primers I own bind to a DNA sequence for a given project I am working on. It also allows me to share these primers very easily; by sharing the file it outputs allows my lab mates to instantly see if I have any primers that can be used in their project. It has been very handy for them!
    • I am happy to help friends modify this script to be useful with their own primer libraries! No R experience is necessary.
    • Anyone can access my most current primer "Annotation Feature Library" here. You can also see the files used to generate it there.

Use notes

  • If the primer binds in the forward direction, the primer will be light gray
  • If the primer binds in the reverse direction, it will be dark gray
  • If the primer binds in the opposite direction stated in my primer table, it will appear red. (If it says F in the primer name, it is a reverse primer & vice versa.)
demo of APE primer library tool
  • Examples:
    • Primer 7 is VF2 in BioBricks. Primer 60 is its reverse compliment. In a biobrick vector, it appears light gray for 7 and dark gray for 60. pCM66 happens to have this same sequence in the region upstream from the multiple cloning site, except it is REVERSED. Both primers will appear red as they bind in the opposite direction expected.
    • I designed some primers for a Kan cassette. The Kan cassette in pCM66 is read in the reverse direction, so all the primers built for a forward Kan cassette appear red.
      Kan primers binding in the opposite direction relative to my database appear red

Skills

  • Metabolic engineering, molecular biology, enzyme assays, enzyme evolution, high-throughput screening, metabolomics, Gibson cloning
  • R & ggplot2, Python, Git/GitHub, LaTeX, Inkscape, Linux

Why I love ggplot2 (and R)

Data is beautiful. Interacting and communicating data elegantly makes me happy.

R is a relatively easy language to pick up, whether or not you have prior programming experience. It is one of the best languages for noodling with tabular data and doing statistics, though Python's emulations of R's strengths are growing more appealing. I use R because I am in love with the ggplot2 plotting package in R. To get a sense of its power, just type "ggplot2" into google images. The book that introduces the fundamentals is freely available online.

Two cool features of ggplot2:

(1) Layers. Imagine you have defined a plot called p in a program. If you want to add some layer to the plot, you just say p + layer. You can just layer in data, aesthetics, statistics, etc. You can also make one base plot, then make a bunch of variants of it by adding different layers of interest. It is hard to imagine going back once you have this freedom. Layers have different types of geometries you can apply.

(2) Facets. Biological data and experimental data are complex! ggplot2 can help you plot complex data by spatially separating out variables, mapping multiple aesthetics (color, size, shape, outline, etc.) to one point, communicating more information than an excel plot can.

Having scripts that I can recycle when doing similar experiments allows me to do in-depth quality checks and make summary statistics that would be impractical to do with Excel. The quality check plots are automatically generated for each data set. Then I add plots that are specific to the questions I was investigating with my experiment. This series of plots paint a story about how the experiment was performed, what variables were important, and what the key findings are. I can compare elements of different experiments by comparing similar plots that are generated (almost) automatically for each experiment with the (almost) identical scripts. Here is a sample folder sample folder] from an experiment I did recently to get a small sense.