Physics307L:People/Smith/Notebook/7

=Lab 7: Poisson Statistics= Lab Partner: Kyle Martin

Preface
The lab manual we have been using for this class, which is last year's lab manual written by Dr. Gold, has a very sparse section for the Poisson Statistics lab. Kyle and I referred to it for the basic premise of the measurements we took and I referred to it for some basic ideas for data analysis.

Purpose
The overall goal of this lab is familiarization with multichannel analyzers and Poissonian data. Reading through my colleague's notebooks (yes, there is something to be said for open science, and for the use of wikis), I came up with some important questions I wanted to answer in the course of this lab. Specifically,
 * Are the random, independent events of muons (see here for more information about muons and muon sources) striking a scintillation detector in our laboratory described accurately by a Poisson distribution?
 * Is the standard deviation of the number of events we measured described accurately by the standard deviation of a Poisson distribution?
 * Does the goodness-of-fit of the Poisson distribution change with the anticipated number of events?
 * Does a Gaussian distribution accurately represent random, independent events?
 * Does the goodness-of-fit of the Gaussian distribution change with the anticipated number of events?

Materials
For this lab, we used
 * A box of lead bricks
 * A Thallium doped sodium iodine crystal scintillator (see here, here and here for more about this)
 * A Photomultiplier tube (see here for more about this)
 * A preamp, amplifier and discriminator ("PAD")
 * 01:06, 10 December 2007 (CST):Ohhhh, so that's what that is? I tried looking it up, and look what I found: constant fraction discriminator How useful would that be for the speed of light lab, huh?
 * A PC data acquisition card with "hydra breakout cable" (which is a connector with many "heads") (a photo would be nice here...  I don't have one)
 * Multichannel Analyzer Software
 * A NIM-bin
 * A High-Voltage DC power supply for the PMT
 * Mathworks' MATLAB software for analyzing the data collected and answering my questions above

Setup
The photomultiplier tube (PMT) and scintillator are placed in the box of lead bricks, in order to block sources of radiation we don't wish to measure. The PMT has a large potential across it, provided by the high voltage DC power supply. The scintillator is attached to the PMT. Muons striking the scintillator will create ultraviolet photons which are "detected" by the PMT - incident photons will cause a drop in voltage across the PMT. The scintillation detector (the phototube and attached scintillator) is connected with a BNC cable to the amplifier and discriminator module, which in turn is connected to the data acquisition card in the PC. Voltage drops across the PMT, corresponding to incident muons, will be recorded by the multichannel analyzer software. We don't care about the energy of the muons here, so we don't bother to record the current produced by the PMT, we just record when they occur.

For acquiring data, Kyle and I turned the amplifier down ("minimum gain") and didn't mess with the discriminator much. I wish I had remembered to record what settings we used, but they aren't really all that important. The goal here is to gather the times when some random events happen and examine their distribution; setting the amplifier higher would probably just increase the sensitivity of the muon detection, making the number of events we record higher for any given time interval, and setting the discriminator would do something similar by selecting the threshold voltage of events to pass along to the PC data acquisition card.

Methods

 * The multichannel analyzer software will record events it receives for a given time interval ("dwell time") in one "channel". Setting the MCA software to collect 512 channels at a dwell time of 1 millisecond, for instance, will record events which happen between 0-1ms in channel 1, events that happen between 1-2ms in channel 2 and so on until it records 512 channels.  The MCA software will display the results on screen as a plot of number of events vs. channel number.  Saving the results of a measurement of 512 channels will create an ASCII (or text, if you're like me and don't care for high-falutin' nonsense) file.  This file will have several lines of text to start with which record some parameters of the MCA software, followed by 512 rows of 3 columns of numbers, separated by commas.  The first column of this file is the channel number, the second column is the number of events recorded and the third column is something else (I've got not idea what it's for, maybe for pulse height analysis.)
 * Using the multichannel analyzer software on the PC, Kyle and I took several measurements using our setup. We measured the number of events occurring for dwell times of 1 ms, 2 ms, 4 ms, 8 ms, 10 ms, 20 ms, 40 ms, 80 ms, 100 ms, 200 ms, 400 ms, 800 ms, 1 s, 2 s and 4 s.  The 1 ms, 2 ms, 4 ms, 8 ms, 10 ms, 20 ms, and 40 ms dwell-time measurements were taken with 1024 channels, the 80 ms, 100 ms, 200 ms, 400 ms, 800 ms, 1 s, 2 s, and 4 s dwell-time measurements were taken with 256 channels.  The output of each of these measurements were saved to file.
 * I loaded these files in MATLAB to examine the distribution of events we recorded.

Data and Analysis
Using MATLAB, I wrote an "M-file", or MATLAB script, to load and examine our data. I used MATLAB's "Publish to HTML" function to save both the code and output (with figures, saved as .png files) as HTML files. I zipped these files, and uploaded the zip file. It can be downloaded. I went a little bit overboard in scripting this: the script itself is more than 250 lines long. I'm sure if I were actually good at coding in MATLAB, this script would be significantly different (i.e. better!), but I'm just learning MATLAB.

I will post a description of what I did in my MATLAB script, along with relevant figures from its output. But, first, I will try to describe some relevant details about Poisson and Gaussian distributions.

About Gaussian Distributions
The Gaussian (or "normal") probability density function is

$$\frac1{\sigma\sqrt{2\pi}}\; \exp\left(-\frac{\left(x-\mu\right)^2}{2\sigma^2} \right) \!$$

Where $$\sigma$$ is the standard deviation of the data, and $$\mu$$ is the expected value. In our case, I believe the "expected value" is the mean of our data and the "standard deviation" is the standard deviation of our data.

About Poisson Distributions
The Poisson probability mass function (analogous to the probability density function, but for discrete values) is

$$f(k;\lambda )=\frac{e^{-\lambda } \lambda^k}{k!},\,\!$$

Where $$\lambda$$ is the number of events per time interval and k is the number of events.

MATLAB Script Summary and Output

 * I first calculated the standard deviations of the number of events we recorded (separately for each run of different dwell times, of course.) These standard deviations will later be compared to the standard deviations of a Poisson distribution with parameters found from our data (&lambda;, or number of events per time interval).
 * For calculating standard deviation, I used the equation $$s = \left( \frac{1}{n-1} \sum_{i=1}^n (x_i - \bar{x})^2 \right) ^{\frac{1}{2}}$$, where s is the standard deviation, n is the number of items in the sample, $$x_i$$ is the ith item in the sample x, and $$\bar{x}$$ is the mean of the sample x.
 * I then plotted our data. I created 3 figures with 4 subplots and 1 figure with 3 subplots.  The subplots show something similar to what was displayed by the MCA software: a plot of the number of events vs. the channel.
 * These figures are shown on this page as Figures 1-4.
 * In order to compare the distribution of our data to the a Poisson distribution and Gaussian distribution, I did the following:
 * I estimated the lambda for each run. These numbers can be seen in Table 1, and I made of plot of these numbers vs. dwell time (see Figure 5).
 * In order to estimate the lambda, I used the equation $$\lambda_{MLE} = \frac{1}{n} \sum_{i=1}^n x_i$$, where $$\lambda_{MLE}$$ is the "Maximum Likelihood Estimate" of lambda, n is the number of items in the sample, and $$x_i$$ is the ith item in sample x. I tend to use $$\lambda$$ in place of $$\lambda_{MLE}$$ in my figures, so keep that in mind.  If my method of determining the $$\lambda_{MLE}$$ produces an inaccurate estimate of $$\lambda$$, most of my figures will be affected.
 * I used the lambda estimated above to create vectors of the probability of seeing k number of events using a Poisson probability density function, and to create vectors of the probability of seeing k number of events using a Gaussian probability density function.
 * In order to estimate the probability of seeing k number of events using a Poisson MDF with $$\lambda$$ events per time interval, I used $$y = \frac{\lambda^k}{k!}e^{-k}$$, where $$\lambda$$ is $$\lambda_{MLE}$$ determined earlier and k is the number of events I wish to evaluate at.
 * In order to estimate the probability of seeing k number of events using a Gaussian PDF with $$\lambda$$ as the mean and the standard deviation I calculated earlier, I used $$y = \frac{1}{\sigma \sqrt{2\pi}} e^{\frac{-(k-\mu)^2}{2\sigma ^2}}$$, where $$\mu$$ is the mean (also, interestingly, $$\lambda_{MLE}$$), $$\sigma$$ is the standard deviation, and k is the number of events I wish to evaluate at.
 * I created a histogram vector from our data: the probability of seeing k number of events. To calculate the "probability" of seeing k number of events, I divided the frequency of k number of events by the total number of events.  I think this makes sense, since the sum of this vector is 1.
 * I plotted the three distributions (our data pdf, Poisson pdf, and Gaussian pdf) vs. k number of events. I created separate figures for each run.  These figures can be seen on this page as Figures 6-20.
 * For each run, I found what I think is the Chi-Square goodness-of-fit of the Poisson PDF to our data, and the Chi-Square goodness-of-fit of the Gaussian PDF to our data. I had a bit of trouble figuring out how to use the Chi-Square distribution to measure the goodness-of-fit of distributions to our data, and I may have ended up doing it wrong, but heres what I did:
 * In order to calculate the Chi-Square goodness-of-fit of the two different probability functions to our data, I used $$\chi^2 = \sum \frac{(X_i - \mu_i)^2}{\sigma_i^2}$$, where $$X_i$$ is the probability of seeing k events we measured,$$\mu_i$$ is the probability 'prediction' for that number of events and $$\sigma_i$$ is the standard deviation.
 * In order to determine whether the goodness-of-fit of Poisson distribution and the goodness-of-fit of a Gaussian distribution varies with &lambda;, I created a log-log plot of Chi-Square vs. Lambda. This plot can be seen as Figure 21.
 * In order to compare the standard deviation of our data to the standard deviation of a Poisson distribution, I created a log-log plot of the standard deviation of our data and the standard deviation of a Poisson distribution vs. dwell time. This figure can be seen as Figure 22.

Conclusions and Remarks
In my purpose, I wanted to answer several questions. I'll try briefly answer them here.
 * Are the random, independent events of muons striking a scintillation detector in our laboratory described accurately by a Poisson distribution?
 * While difficult to answer conclusively, I believe the answer to this question is yes. As evidence, I present the Chi-Square goodness-of-fit of the Poisson PMD for my data.  From what I've read, if this number is less than 1 or so, the fit is good.  Since my Chi-Square ranges from 0.0001 to 0.032, I believe my fit is excellent.
 * 01:26, 10 December 2007 (CST):This is what I was hinting above. It may be that you were over-estimating your sigma or something.  I haven't spent time looking at it unfortunately.  When the distribution is normal, the reduced chi-squared (chi-squared per degree of freedom) should be about 1 for a good fit, I think.  If it is way too low, it means the errors are over-estimated.  The fact that yours tends to zero, probably means it is not quite a chi-squared parallel.  But a great method none-the-less.
 * Is the standard deviation of the number of events we measured described accurately by the standard deviation of a Poisson distribution?
 * Qualitatively, the answer is sort of. In Figure 22, I demonstrate that the standard deviation of my data is always a bit larger than the standard deviation of a Poisson distribution (the square root of lambda).  I don't know why this is; systematic error is a probable culprit, although I'm unsure what exactly would cause this.  Somehow, I think the events that the MCA counted were not completely random and independent, or maybe the discriminator didn't pass along many events it should have, or something else happened.
 * Does the goodness-of-fit of the Poisson distribution change with the anticipated number of events?
 * Yes. As lambda increases, as seen in Table 2 and Figure 21, the goodness-of-fit improves.  This is a relatively small amount compared to the Gaussian distribution's goodness-of-fit, though.
 * Does a Gaussian distribution accurately represent random, independent events? And does the goodness-of-fit of the Gaussian distribution change with the anticipated number of events?
 * If the number of anticipated events you are examining is large enough, I believe the Gaussian distribution accurately represents what you see. As Figure 22 and Table 2 demonstrate, the goodness-of-fit drastically changes as lambda changes.

I was wholly unfamiliar with the Poisson distribution before this lab, and became fairly comfortable with it by the end. I also got a bit more comfortable with the Gaussian distribution and using it.

I did struggle with Matlab at first, but I think I got a handle on it. And I got some nice figures out of Matlab, with a bit of work, that I'm pleased with.

I was a little unclear about how to use the Poisson PMF and Gaussian PDF to estimate the number of events one would expect to see, and still am slightly troubled that I didn't ever integrate anything.

=Links to other entries=