User:Janet B. Matsen: Difference between revisions

Revision as of 08:26, 26 November 2015

Janet B. Matsen

Department of Chemical Engineering
Seattle, Washington

jmatsen@uw.edu

I am a Chemical Engineering PhD candidate graduating in Winter 2016. My major project has been implementation of a novel carbon-fixation pathway, which included a computationally designed enzyme and three enzyme reactions not found in nature. For my last year I have transitioned to fully computational work. My new project involves metagenomics and metatranscriptomics of a methane oxidizing community using collaborative programming, remote \& cloud computing, machine learning, and visualization of large multivariate data sets. In addition, I will graduate with an Advanced Data Science certificate for coursework in statistics, machine learning, data management, and data visualization.

Please see my resume or LinkedIn for more professional information and GitHub for some of my code.

I work with Mary Lidstrom, David Baker, and David Beck.

OpenWetWare Contributions

I started the Lidstrom Lab OWW wiki in 2011 and love posting what I learn! Wet lab biology is full of dogmas that I enjoy challenging. When I learn that dogmas are false, or the importance of particular variables in methods, I post in The Lidstrom lab's wiki. I have over 27,000 contributions over dozens of pages. Two popular ones are my [[Janet_B._Matsen:Guide_to_Gibson_Assembly|Guide to Gibson Assembly], and SDS-PAGE.

In addition, I posted some protocols specific to my PhD work to a GitHub driven web page for use in working with Helen Chan, a fabulous Chemical Engineering undergraduate .

Research Interests

production of chemicals using microbes

protein engineering

metabolic engineering

synthetic biology

transcriptomics

chemical engineering applied to biology

Education

PhD (in progress) University of Washington, Seattle

Chemical Engineering, expected graduation: 2015-16

Lidstrom Lab

B.S. University of California, Berkeley

Chemical Engineering, 2010

Publications

Matsen, Yang, Stein, Beck, & Kalyuzhnaya. Global molecular analyses of methane metabolism in methanotrophic alphaproteobacterium, Methylosinus trichosporium OB3b. Part I: transcriptomic study. Frontiers in Microbiology (open access), 2013
Yang, Matsen, Konopka, Green-Saxena, Clubb, Sadilek, Orphan, Beck, & Kalyuzhnaya. Global molecular analyses of methane metabolism in methanotrophic Alphaproteobacterium, Methylosinus trichosporium OB3b. Part II. metabolomics and 13C-labeling study. Frontiers in Microbiology (open access), 2013

Awards & Activities

2012 honorable mention for the National Science Foundation's Graduate Research Fellowship Program

Outreach:

2011-present Outreach Coordinator for the Puget Sound chapter of the American Institute of Chemical Engineers
Leading a mentoring project with 8 chemical engineering mentors and 8 students from the Technology Access Foundation Academy in Kent, WA.

2010-2011 Outreach Coordinator for the University of Washington chapter of the American Chemical Engineering Society
Organized two half-day and one all-day events for students from MESA, the Math, Engineering, Science Achievement organization of Washington, involving 60 volunteer- hours and resulting in 660 student-hours of outreach to disadvantaged minority students.

Misc. outreach:
Gave a presentation to high school students describing statistical challenges associated with transcriptomics research.

Hosted a booth at Engineering Discovery Days at University of Washington, engaging and educating the public about chemical engineering.

My Personal Pages

Linkedin

GitHub

Protocols specific to my project, as of 3/2015. Hosted on GitHub.

Guide to Gibson Assembly

Lab Tips & Tricks

Useful Links

Books I like

Personal Notes for Thesis Project

Not maintained any longer:

Open Lab Questions

Closed Lab Questions

Best Lab Practices

Tools to Share

GitHub Repository

Janet on GitHub
- All of my plasmid files are found in a version controlled repository
- Protocols for my personal use and collaboration with Helen Chan (Chemical Engineering Undergrad) are here: GitHub Pages: Janet Matsen.
- I'm beginning to contribute code to the LidLab GitHub repository. My sub-folder is here.
  - Favorite function for exporting data from the SpectraMax 190 plate reader: SpectraMax_190_plate_reader_data_importer

APE annotation library generator & list of primers to share with our lab

This is the first script I ever wrote, and remains important in my research. Feel free to download it and enjoy it yourself.
Ape Annotation Feature Library Creator
- This is an R script that converts the info in my list of primers into a file that I can use to annotate DNA files in APE with.
It:
- trims out sequences not intended for sequencing such as Gibson assembly primers
- makes a label that combines the unique primer number, the melting temperature, and the letter F or R for forward or reverse, and an asterisk if you should consult the primer spreadsheet comments before using it
- assigns colors in APE that communicate whether it primers in the forward direction or the reverse direction.
- saves the info in the format APE needs, with the date it was generated in the title.
This allows me to instantly see where all of the primers I own bind to a DNA sequence for a given project I am working on. It also allows me to share these primers very easily; by sharing the file it outputs allows my lab mates to instantly see if I have any primers that can be used in their project. It has been very handy for them!
- I am happy to help friends modify this script to be useful with their own primer libraries! No R experience is necessary.
- Anyone can access my most current primer "Annotation Feature Library" here. You can also see the files used to generate it there.

Use notes

If the primer binds in the forward direction, the primer will be light gray
If the primer binds in the reverse direction, it will be dark gray
If the primer binds in the opposite direction stated in my primer table, it will appear red. (If it says F in the primer name, it is a reverse primer & vice versa.)

Examples:
- Primer 7 is VF2 in BioBricks. Primer 60 is its reverse compliment. In a biobrick vector, it appears light gray for 7 and dark gray for 60. pCM66 happens to have this same sequence in the region upstream from the multiple cloning site, except it is REVERSED. Both primers will appear red as they bind in the opposite direction expected.
- I designed some primers for a Kan cassette. The Kan cassette in pCM66 is read in the reverse direction, so all the primers built for a forward Kan cassette appear red.
  Kan primers binding in the opposite direction relative to my database appear red

Skills

Metabolic engineering, molecular biology, enzyme assays, enzyme evolution, high-throughput screening, metabolomics, Gibson cloning
R & ggplot2, Python, Git/GitHub, LaTeX, Inkscape, Linux

Why I love ggplot2 (and R)

Data is beautiful. Interacting and communicating data elegantly makes me happy.

R is a relatively easy language to pick up, whether or not you have prior programming experience. It is one of the best languages for noodling with tabular data and doing statistics, though Python's emulations of R's strengths are growing more appealing. I use R because I am in love with the ggplot2 plotting package in R. To get a sense of its power, just type "ggplot2" into google images. The book that introduces the fundamentals is freely available online.

Two cool features of ggplot2:

(1) Layers. Imagine you have defined a plot called p in a program. If you want to add some layer to the plot, you just say p + layer. You can just layer in data, aesthetics, statistics, etc. You can also make one base plot, then make a bunch of variants of it by adding different layers of interest. It is hard to imagine going back once you have this freedom. Layers have different types of geometries you can apply.

(2) Facets. Biological data and experimental data are complex! ggplot2 can help you plot complex data by spatially separating out variables, mapping multiple aesthetics (color, size, shape, outline, etc.) to one point, communicating more information than an excel plot can.

Having scripts that I can recycle when doing similar experiments allows me to do in-depth quality checks and make summary statistics that would be impractical to do with Excel. The quality check plots are automatically generated for each data set. Then I add plots that are specific to the questions I was investigating with my experiment. This series of plots paint a story about how the experiment was performed, what variables were important, and what the key findings are. I can compare elements of different experiments by comparing similar plots that are generated (almost) automatically for each experiment with the (almost) identical scripts. Here is a sample folder sample folder] from an experiment I did recently to get a small sense.

@@ Line 19: / Line 19: @@
 |}
-I am a 5th year Chemical Engineering PhD student at the University of Washington working in Mary Lidstrom's Lab.  We are engineering E. coli to make biofuel precursors from electricity and CO<sub>2</sub> using a metabolic pathway that doesn't exist in nature.  Success would enable production of biofuel from renewable electricity.  Getting the pathway to work in living cells has been challenging.  We are combining metabolic engineering, synthetic biology, metabolomics, enzyme engineering, and directed evolution in E. coli and a novel methylotroph to achieve this goal.
+I am a Chemical Engineering PhD candidate graduating in Winter 2016.  My major project has been implementation of a novel carbon-fixation pathway, which included a computationally designed enzyme and three enzyme reactions not found in nature.  For my last year I have transitioned to fully computational work.  My new project involves metagenomics and metatranscriptomics of a methane oxidizing community using collaborative programming, remote \& cloud computing, machine learning, and visualization of large multivariate data sets.  In addition, I will graduate with an Advanced Data Science certificate for coursework in statistics, machine learning, data management, and data visualization.
-My first year was spent investigating methanotrophic metabolism in pure cultures and a model ecosystem in a team that combined transcriptomics, metabolomics, and single-cell observation.
+Please see my [http://openwetware.org/images/e/e5/2015-Matsen-resume.pdf resume] or [https://www.linkedin.com/in/janetmatsen LinkedIn] for more professional information and [https://github.com/JanetMatsen GitHub] for some of my code.
+I work with [http://depts.washington.edu/mllab/ Mary Lidstrom], [http://www.bakerlab.org/ David Baker], and [http://faculty.washington.edu/dacb/ David Beck].
 [[Image:Janet_Matsen.png]]
-I started the [[Lidstrom|Lidstrom Lab OWW wiki]] and love posting what I learn!  It has been a lot of fun to record what I have learned about lab techniques, and my pages are viewed by many scientists outside the lab.  This wiki is also a fun place me to share tips/tricks and results of experiments that probe dogma in experimental techniques.
+== OpenWetWare Contributions ==
+I started the [[Lidstrom|Lidstrom Lab OWW wiki]] in 2011 and love posting what I learn!
+Wet lab biology is full of dogmas that I enjoy challenging.
+When I learn that dogmas are false, or the importance of particular variables in methods, I post in [[Lidstrom:Protocols|The Lidstrom lab's wiki]].
+I have over 27,000 contributions over dozens of pages.
+Two popular ones are my [[Janet_B._Matsen:Guide_to_Gibson_Assembly|Guide to Gibson Assembly], and [[Lidstrom:_SDS-PAGE|SDS-PAGE]].
+In addition, I posted some protocols specific to my PhD work to a GitHub driven [http://janetmatsen.github.io/protocols/ web page] for use in working with [https://www.linkedin.com/in/helen-chan-a2665891 Helen Chan], a fabulous Chemical Engineering undergraduate .
 == Research Interests ==
 <blockquote>
 * production of chemicals using microbes
-*metabolic engineering
+* protein engineering
-*synthetic biology
+* metabolic engineering
-*transcriptomics
+* synthetic biology
-*chemical engineering
+* transcriptomics
+* chemical engineering applied to biology
 </blockquote>
@@ Line 68: / Line 79: @@
 == My Personal Pages ==
 <blockquote>
+* [https://www.linkedin.com/in/janetmatsen Linkedin]
+* [https://github.com/JanetMatsen GitHub]
+* [http://janetmatsen.github.io/protocols/ Protocols] specific to my project, as of 3/2015.  Hosted on GitHub.
 *[[Janet B. Matsen:Guide to Gibson Assembly|Guide to Gibson Assembly]]
 *[[Janet B. Matsen:Lab Tips & Tricks|Lab Tips & Tricks]]
+*[[Janet B. Matsen:Useful Links|Useful Links]]
+*[[Janet B. Matsen:Books I like|Books I like]]
+*[[Janet B. Matsen:Thesis Project|Personal Notes for Thesis Project]]
+Not maintained any longer:
 *[[Janet B. Matsen:Open Lab Questions|Open Lab Questions]]
 *[[Janet B. Matsen:Closed Lab Questions|Closed Lab Questions]]
 *[[Janet B. Matsen:Best Lab Practices|Best Lab Practices]]
-*[[Janet B. Matsen:Useful Links|Useful Links]]
-*[[Janet B. Matsen:Books I like|Books I like]]
-*[[Janet B. Matsen:Thesis Project|Personal Notes for Thesis Project]]
 <br>
 </blockquote>
@@ Line 82: / Line 99: @@
 === GitHub Repository ===
 * [https://github.com/JanetMatsen Janet on GitHub]
-** I'm beginning to contribute code to the [https://github.com/dacb/lidlab LidLab GitHub repository.
+** All of my plasmid files are found in a [https://github.com/JanetMatsen/Plasmids version controlled repository]
+** Protocols for my personal use and collaboration with Helen Chan (Chemical Engineering Undergrad) are here: [http://janetmatsen.github.io/protocols/ GitHub Pages: Janet Matsen].
+** I'm beginning to contribute code to the [https://github.com/dacb/lidlab/ LidLab GitHub repository].  My sub-folder is [https://github.com/dacb/lidlab/tree/master/jm here].
+*** Favorite function for exporting data from the SpectraMax 190 plate reader: [https://github.com/dacb/lidlab/tree/master/jm/SpectraMax_190_plate_reader_data_importer SpectraMax_190_plate_reader_data_importer]
 === APE annotation library generator & list of primers to share with our lab ===
+* This is the first script I ever wrote, and remains important in my research.  Feel free to download it and enjoy it yourself.
 *[https://www.dropbox.com/s/3p5bnfip4ks8daa/APE_AnnotationFeatureLibraryCreator.R Ape Annotation Feature Library Creator]
-** This is an R script that converts the info in [https://docs.google.com/spreadsheet/ccc?key=0AlVxrZi130nMdHlsaml2OGFDUW9zRlVBdkRKaXVEbkE#gid=22 my list of primers] into a file that I can use to annotate DNA files in APE with.  It:
+** This is an R script that converts the info in [https://docs.google.com/spreadsheet/ccc?key=0AlVxrZi130nMdHlsaml2OGFDUW9zRlVBdkRKaXVEbkE#gid=22 my list of primers] into a file that I can use to annotate DNA files in APE with.
-***trims out sequences not intended for sequencing such as Gibson assembly primers
+*It:
-***makes a label that combines the unique primer number, the melting temperature, and the letter F or R for forward or reverse, and an asterisk if you should consult the primer spreadsheet comments before using it
+** trims out sequences not intended for sequencing such as Gibson assembly primers
-***assigns colors in APE that communicate whether it primers in the forward direction or the reverse direction.
+** makes a label that combines the unique primer number, the melting temperature, and the letter F or R for forward or reverse, and an asterisk if you should consult the primer spreadsheet comments before using it
-***saves the info in the format APE needs, with the date it was generated in the title.
+** assigns colors in APE that communicate whether it primers in the forward direction or the reverse direction.
-** This allows me to instantly see where all of the primers I own bind to a DNA sequence for a given project I am working on.  It also allows me to share these primers very easily; by sharing the file it outputs allows my lab mates to instantly see if I have any primers that can be used in their project.  It has been very handy for them!
+** saves the info in the format APE needs, with the date it was generated in the title.
+* This allows me to instantly see where all of the primers I own bind to a DNA sequence for a given project I am working on.  It also allows me to share these primers very easily; by sharing the file it outputs allows my lab mates to instantly see if I have any primers that can be used in their project.  It has been very handy for them!
 ** I am happy to help friends modify this script to be useful with their own primer libraries!  No R experience is necessary.
 ** Anyone can access my most current primer "Annotation Feature Library" [https://www.dropbox.com/sh/5w53jl3jhbdddvp/iW7cOtZ2Wd here].  You can also see the files used to generate it there.
@@ Line 102: / Line 125: @@
 ** I designed some primers for a Kan cassette.  The Kan cassette in pCM66 is read in the reverse direction, so all the primers built for a forward Kan cassette appear red.  [[image:2013_05_08_Kan_casette.jpg||thumb|center|Kan primers binding in the opposite direction relative to my database appear red]]
-== Skills I'm developing ==
+== Skills ==
-*molecular biology
+* Metabolic engineering, molecular biology, enzyme assays, enzyme evolution, high-throughput screening, metabolomics, Gibson cloning
-* enzyme assays
+* R & ggplot2, Python, Git/GitHub, LaTeX, Inkscape, Linux
-*mass spectrometry based metabolomics
-*R & ggplot2
-*Inkscape
-*Gibson cloning
 == Why I love ggplot2 (and R) ==
-R is a very easy language for people with experience to pick up, and it is one of the easiest for people without experience as well.  It is definitely the best language for noodling with data and doing statistics.
+Data is beautiful.  Interacting and communicating data elegantly makes me happy.
+R is a relatively easy language to pick up, whether or not you have prior programming experience.  It is one of the best languages for noodling with tabular data and doing statistics, though Python's emulations of R's strengths are growing more appealing.  I use R because I am in love with the ggplot2 plotting package in R.   To get a sense of its power, just [https://www.google.com/search?q=ggplot2&espv=210&es_sm=119&source=lnms&tbm=isch&sa=X&ei=IVYiU-mIG8nFoAS974DwCg&ved=0CAkQ_AUoAQ&biw=1665&bih=929 type "ggplot2" into google images].  The book that introduces the fundamentals is [http://www.bioinformaticslaboratory.nl/twikidata/pub/Education/ComputinginR/ggplot2-book.pdf freely available online].
+Two cool features of ggplot2:
-R was developed at the Hutch, but is a "big deal" worldwide.  ggplot2 is a more recent package that can be used within R.   To get a sense of its power, just [https://www.google.com/search?q=ggplot2&espv=210&es_sm=119&source=lnms&tbm=isch&sa=X&ei=IVYiU-mIG8nFoAS974DwCg&ved=0CAkQ_AUoAQ&biw=1665&bih=929 type "ggplot2" into google images].  The book that introduces the fundamentals is [http://www.bioinformaticslaboratory.nl/twikidata/pub/Education/ComputinginR/ggplot2-book.pdf freely available online].
+(1) '''Layers'''.  Imagine you have defined a plot called p in a program.  If you want to add some layer to the plot, you just say p + layer.  You can just layer in data, aesthetics, statistics, etc.  You can also make one base plot, then make a bunch of variants of it by adding different layers of interest.  It is hard to imagine going back once you have this freedom.  Layers have different types of geometries you can apply.
-I like to use ggplot2 for two main few reasons:
+(2) [https://www.google.com/search?q=ggplot2+facet&espv=210&es_sm=119&source=lnms&tbm=isch&sa=X&ei=gFciU5TkAcjcoASAzIGoBg&ved=0CAoQ_AUoAg&biw=1665&bih=929 '''Facets'''].  Biological data and experimental data are complex!  ggplot2 can help you plot complex data by spatially separating out variables, mapping multiple aesthetics (color, size, shape, outline, etc.) to one point, communicating more information than an excel plot can.
-(1) Layers.  Imagine you have defined a plot called p in a program.  If you want to add anything to the plot, you just say p + thing.  You can just layer in data, aesthetics, statistics, etc.  You can also make one base plot, then make a bunch of variants of it by adding different layers of interest.  It is hard to imagine going back once you have this freedom.  Layers have different types of geometries you can apply.
-(2) [https://www.google.com/search?q=ggplot2+facet&espv=210&es_sm=119&source=lnms&tbm=isch&sa=X&ei=gFciU5TkAcjcoASAzIGoBg&ved=0CAoQ_AUoAg&biw=1665&bih=929 Facets].  Biological data is complex!  Experimental data is complex too.  ggplot2 can help you plot complex data by spatially separating out variables.
 Having scripts that I can recycle when doing similar experiments allows me to do in-depth quality checks and make summary statistics that would be impractical to do with Excel.  The quality check plots are automatically generated for each data set.  Then I add plots that are specific to the questions I was investigating with my experiment.  This series of plots paint a story about how the experiment was performed, what variables were important, and what the key findings are.  I can compare elements of different experiments by comparing similar plots that are generated (almost) automatically for each experiment with the (almost) identical scripts.  Here is a [https://www.dropbox.com/sh/08h7uuov7dvhf7o/l-kZLXS-0S sample folder]  sample folder] from an experiment I did recently to get a small sense.

User:Janet B. Matsen: Difference between revisions

Revision as of 08:26, 26 November 2015

Contents

OpenWetWare Contributions

Research Interests

Education

Publications

Awards & Activities

My Personal Pages

Tools to Share

GitHub Repository

APE annotation library generator & list of primers to share with our lab

Use notes

Skills

Why I love ggplot2 (and R)

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

research

Tools