Physics307L F09:People/Smith/Notebook/7
Contents
Lab 7: Poisson Statistics
Lab Partner: Kyle Martin
Preface
The lab manual we have been using for this class, which is last year's lab manual written by Dr. Gold, has a very sparse section for the Poisson Statistics lab. Kyle and I referred to it for the basic premise of the measurements we took and I referred to it for some basic ideas for data analysis.
Purpose
The overall goal of this lab is familiarization with multichannel analyzers and Poissonian data. Reading through my colleague's notebooks (yes, there is something to be said for open science, and for the use of wikis), I came up with some important questions I wanted to answer in the course of this lab. Specifically,
 Are the random, independent events of muons (see here for more information about muons and muon sources) striking a scintillation detector in our laboratory described accurately by a Poisson distribution?
 Is the standard deviation of the number of events we measured described accurately by the standard deviation of a Poisson distribution?
 Does the goodnessoffit of the Poisson distribution change with the anticipated number of events?
 Does a Gaussian distribution accurately represent random, independent events?
 Does the goodnessoffit of the Gaussian distribution change with the anticipated number of events?
Materials
For this lab, we used
 A box of lead bricks
 A Thallium doped sodium iodine crystal scintillator (see here, here and here for more about this)
 A Photomultiplier tube (see here for more about this)
 A preamp, amplifier and discriminator ("PAD")
 01:06, 10 December 2007 (CST):Ohhhh, so that's what that is? I tried looking it up, and look what I found: constant fraction discriminator How useful would that be for the speed of light lab, huh?
 A PC data acquisition card with "hydra breakout cable" (which is a connector with many "heads") (a photo would be nice here... I don't have one)
 Multichannel Analyzer Software
 A NIMbin
 A HighVoltage DC power supply for the PMT
 Mathworks' MATLAB software for analyzing the data collected and answering my questions above
Setup
The photomultiplier tube (PMT) and scintillator are placed in the box of lead bricks, in order to block sources of radiation we don't wish to measure. The PMT has a large potential across it, provided by the high voltage DC power supply. The scintillator is attached to the PMT. Muons striking the scintillator will create ultraviolet photons which are "detected" by the PMT  incident photons will cause a drop in voltage across the PMT. The scintillation detector (the phototube and attached scintillator) is connected with a BNC cable to the amplifier and discriminator module, which in turn is connected to the data acquisition card in the PC. Voltage drops across the PMT, corresponding to incident muons, will be recorded by the multichannel analyzer software. We don't care about the energy of the muons here, so we don't bother to record the current produced by the PMT, we just record when they occur.
For acquiring data, Kyle and I turned the amplifier down ("minimum gain") and didn't mess with the discriminator much. I wish I had remembered to record what settings we used, but they aren't really all that important. The goal here is to gather the times when some random events happen and examine their distribution; setting the amplifier higher would probably just increase the sensitivity of the muon detection, making the number of events we record higher for any given time interval, and setting the discriminator would do something similar by selecting the threshold voltage of events to pass along to the PC data acquisition card.
Methods
 The multichannel analyzer software will record events it receives for a given time interval ("dwell time") in one "channel". Setting the MCA software to collect 512 channels at a dwell time of 1 millisecond, for instance, will record events which happen between 01ms in channel 1, events that happen between 12ms in channel 2 and so on until it records 512 channels. The MCA software will display the results on screen as a plot of number of events vs. channel number. Saving the results of a measurement of 512 channels will create an ASCII (or text, if you're like me and don't care for highfalutin' nonsense) file. This file will have several lines of text to start with which record some parameters of the MCA software, followed by 512 rows of 3 columns of numbers, separated by commas. The first column of this file is the channel number, the second column is the number of events recorded and the third column is something else (I've got not idea what it's for, maybe for pulse height analysis.)
 Using the multichannel analyzer software on the PC, Kyle and I took several measurements using our setup. We measured the number of events occurring for dwell times of 1 ms, 2 ms, 4 ms, 8 ms, 10 ms, 20 ms, 40 ms, 80 ms, 100 ms, 200 ms, 400 ms, 800 ms, 1 s, 2 s and 4 s. The 1 ms, 2 ms, 4 ms, 8 ms, 10 ms, 20 ms, and 40 ms dwelltime measurements were taken with 1024 channels, the 80 ms, 100 ms, 200 ms, 400 ms, 800 ms, 1 s, 2 s, and 4 s dwelltime measurements were taken with 256 channels. The output of each of these measurements were saved to file.
 I loaded these files in MATLAB to examine the distribution of events we recorded.
Data and Analysis
Using MATLAB, I wrote an "Mfile", or MATLAB script, to load and examine our data. I used MATLAB's "Publish to HTML" function to save both the code and output (with figures, saved as .png files) as HTML files. I zipped these files, and uploaded the zip file. It can be downloaded here. I went a little bit overboard in scripting this: the script itself is more than 250 lines long. I'm sure if I were actually good at coding in MATLAB, this script would be significantly different (i.e. better!), but I'm just learning MATLAB.^{SJK 01:11, 10 December 2007 (CST)}I will post a description of what I did in my MATLAB script, along with relevant figures from its output. But, first, I will try to describe some relevant details about Poisson and Gaussian distributions.
About Gaussian Distributions
The Gaussian (or "normal") probability density function is
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \frac1{\sigma\sqrt{2\pi}}\; \exp\left(\frac{\left(x\mu\right)^2}{2\sigma^2} \right) \!}
Where Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \sigma} is the standard deviation of the data, and Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \mu} is the expected value. In our case, I believe the "expected value" is the mean of our data and the "standard deviation" is the standard deviation of our data.
About Poisson Distributions
The Poisson probability mass function (analogous to the probability density function, but for discrete values) is
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle f(k;\lambda )=\frac{e^{\lambda } \lambda^k}{k!},\,\!}
Where Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \lambda} is the number of events per time interval and k is the number of events.
MATLAB Script Summary and Output
 I first calculated the standard deviations of the number of events we recorded (separately for each run of different dwell times, of course.) These standard deviations will later be compared to the standard deviations of a Poisson distribution with parameters found from our data (λ , or number of events per time interval).
 For calculating standard deviation, I used the equation Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle s = \left( \frac{1}{n1} \sum_{i=1}^n (x_i  \bar{x})^2 \right) ^{\frac{1}{2}}} , where s is the standard deviation, n is the number of items in the sample, Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle x_i} is the ith item in the sample x, and Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \bar{x}} is the mean of the sample x.
 I then plotted our data. I created 3 figures with 4 subplots and 1 figure with 3 subplots. The subplots show something similar to what was displayed by the MCA software: a plot of the number of events vs. the channel.
 These figures are shown on this page as Figures 14.
 In order to compare the distribution of our data to the a Poisson distribution and Gaussian distribution, I did the following:
 I estimated the lambda for each run. These numbers can be seen in Table 1, and I made of plot of these numbers vs. dwell time (see Figure 5).
 In order to estimate the lambda, I used the equation Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \lambda_{MLE} = \frac{1}{n} \sum_{i=1}^n x_i} , where Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \lambda_{MLE}} is the "Maximum Likelihood Estimate" of lambda, n is the number of items in the sample, and Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle x_i} is the ith item in sample x. I tend to use Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \lambda} in place of Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \lambda_{MLE}} in my figures, so keep that in mind. If my method of determining the Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \lambda_{MLE}} produces an inaccurate estimate of Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \lambda} , most of my figures will be affected.
 I used the lambda estimated above to create vectors of the probability of seeing k number of events using a Poisson probability density function, and to create vectors of the probability of seeing k number of events using a Gaussian probability density function.
 In order to estimate the probability of seeing k number of events using a Poisson MDF with Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \lambda} events per time interval, I used Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle y = \frac{\lambda^k}{k!}e^{k}} , where Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \lambda} is Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \lambda_{MLE}} determined earlier and k is the number of events I wish to evaluate at.
 In order to estimate the probability of seeing k number of events using a Gaussian PDF with Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \lambda} as the mean and the standard deviation I calculated earlier, I used Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle y = \frac{1}{\sigma \sqrt{2\pi}} e^{\frac{(k\mu)^2}{2\sigma ^2}}} , where Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \mu} is the mean (also, interestingly, Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \lambda_{MLE}} ), Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \sigma} is the standard deviation, and k is the number of events I wish to evaluate at.
 I created a histogram vector from our data: the probability of seeing k number of events. To calculate the "probability" of seeing k number of events, I divided the frequency of k number of events by the total number of events. I think this makes sense, since the sum of this vector is 1.
 I plotted the three distributions (our data pdf, Poisson pdf, and Gaussian pdf) vs. k number of events. I created separate figures for each run. These figures can be seen on this page as Figures 620.
 For each run, I found what I think is the ChiSquare goodnessoffit of the Poisson PDF to our data, and the ChiSquare goodnessoffit of the Gaussian PDF to our data. I had a bit of trouble figuring out how to use the ChiSquare distribution to measure the goodnessoffit of distributions to our data, and I may have ended up doing it wrong, but heres what I did:
 In order to calculate the ChiSquare goodnessoffit of the two different probability functions to our data, I used Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \chi^2 = \sum \frac{(X_i  \mu_i)^2}{\sigma_i^2}} , where Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle X_i} is the probability of seeing k events we measured,Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \mu_i} is the probability 'prediction' for that number of events and Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \sigma_i} is the standard deviation.^{SJK 01:21, 10 December 2007 (CST)}
 In order to determine whether the goodnessoffit of Poisson distribution and the goodnessoffit of a Gaussian distribution varies with λ, I created a loglog plot of ChiSquare vs. Lambda. This plot can be seen as Figure 21.
 I estimated the lambda for each run. These numbers can be seen in Table 1, and I made of plot of these numbers vs. dwell time (see Figure 5).
 In order to compare the standard deviation of our data to the standard deviation of a Poisson distribution, I created a loglog plot of the standard deviation of our data and the standard deviation of a Poisson distribution vs. dwell time. This figure can be seen as Figure 22.
Figures and Tables
Dwell Time  Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \lambda_{MLE}}  Standard Deviation of Data 
Standard Deviation of Poisson Distribution with Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \lambda_{MLE}} 

1ms  0.00586  0.088  0.077 
2ms  0.0117  0.125  0.108 
4ms  0.0264  0.183  0.162 
8ms  0.0684  0.330  0.252 
10ms  0.0723  0.311  0.269 
20ms  0.147  0.503  0.384 
40ms  0.287  0.644  0.536 
80ms  0.641  1.02  0.800 
100ms  0.648  1.05  0.805 
200ms  1.55  1.46  1.24 
400ms  2.75  2.22  1.66 
800ms  5.86  2.90  2.42 
1s  7.14  3.28  2.67 
2s  14.6  4.83  3.82 
4s  28.6  7.03  5.35 
Dwell Time  Poisson PDF Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \chi^2}
to data histogram 
Gaussian PDF Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \chi^2}
to data histogram 

1ms  0.0007  2267 
2ms  0.0014  461.1 
4ms  0.0023  62.49 
8ms  0.0132  2.530 
10ms  0.0092  2.498 
20ms  0.0268  0.023 
40ms  0.0323  0.1548 
80ms  0.0259  0.1242 
100ms  0.0318  0.1312 
200ms  0.0081  0.0154 
400ms  0.0027  0.0037 
800ms  0.0007  0.0008 
1s  0.0005  0.0007 
2s  0.0003  0.0003 
4s  0.0001  0.0001 
Figure 23: Probability vs. Number of Events, Dwell Time 4 s. Shown are the data histogram, Poisson PMF and Gaussian PDF. For the Gaussian PDF, I used the Poisson standard deviation (Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \sqrt{\lambda}} ) as the standard deviation of the Gaussian distribution. The resulting Gaussian distribution fits the Poisson distribution very closely, shifted by a bit (maybe half of a bin?)
Conclusions and Remarks
In my purpose, I wanted to answer several questions. I'll try briefly answer them here.
 Are the random, independent events of muons striking a scintillation detector in our laboratory described accurately by a Poisson distribution?
 While difficult to answer conclusively, I believe the answer to this question is yes. As evidence, I present the ChiSquare goodnessoffit of the Poisson PMD for my data. From what I've read, if this number is less than 1 or so, the fit is good. Since my ChiSquare ranges from 0.0001 to 0.032, I believe my fit is excellent.
 01:26, 10 December 2007 (CST):This is what I was hinting above. It may be that you were overestimating your sigma or something. I haven't spent time looking at it unfortunately. When the distribution is normal, the reduced chisquared (chisquared per degree of freedom) should be about 1 for a good fit, I think. If it is way too low, it means the errors are overestimated. The fact that yours tends to zero, probably means it is not quite a chisquared parallel. But a great method nonetheless.
 While difficult to answer conclusively, I believe the answer to this question is yes. As evidence, I present the ChiSquare goodnessoffit of the Poisson PMD for my data. From what I've read, if this number is less than 1 or so, the fit is good. Since my ChiSquare ranges from 0.0001 to 0.032, I believe my fit is excellent.
 Is the standard deviation of the number of events we measured described accurately by the standard deviation of a Poisson distribution?
 Qualitatively, the answer is sort of. In Figure 22, I demonstrate that the standard deviation of my data is always a bit larger than the standard deviation of a Poisson distribution (the square root of lambda). I don't know why this is; systematic error is a probable culprit, although I'm unsure what exactly would cause this. Somehow, I think the events that the MCA counted were not completely random and independent, or maybe the discriminator didn't pass along many events it should have, or something else happened.
 Does the goodnessoffit of the Poisson distribution change with the anticipated number of events?
 Yes. As lambda increases, as seen in Table 2 and Figure 21, the goodnessoffit improves. This is a relatively small amount compared to the Gaussian distribution's goodnessoffit, though.
 Does a Gaussian distribution accurately represent random, independent events? And does the goodnessoffit of the Gaussian distribution change with the anticipated number of events?
 If the number of anticipated events you are examining is large enough, I believe the Gaussian distribution accurately represents what you see. As Figure 22 and Table 2 demonstrate, the goodnessoffit drastically changes as lambda changes.
I was wholly unfamiliar with the Poisson distribution before this lab, and became fairly comfortable with it by the end. I also got a bit more comfortable with the Gaussian distribution and using it.
I did struggle with Matlab at first, but I think I got a handle on it. And I got some nice figures out of Matlab, with a bit of work, that I'm pleased with.
I was a little unclear about how to use the Poisson PMF and Gaussian PDF to estimate the number of events one would expect to see, and still am slightly troubled that I didn't ever integrate anything.
Links to other entries
My Wednesday Labs  



