Revision as of 12:42, 20 October 2007

Poisson Statistics

experimentalists: me (princess bradley) and Nikipoo

word of the week: DATUM

Goal

By measuring random events that occur very rarely, I hope to analyze how well a Poisson and a Gaussian can fit the data. I believe the random events we are measuring to be cosmic radiation, but I am not sure. We are measuring whatever radiation can make its way through a "house" of lead bricks and then activate a NaI scintillator in the physics building at UNM.

Theory

The Poisson and Gaussian distributions are probability distributions. I will assume you know what probability distributions are. Whatever radiation we are measuring is randomly occurring, which is why I have chosen to analyze the data with the Poisson and the Gaussian (these distributions results from random events).

Poisson Distribution

When counting random events, the Poisson distribution is often used when the random events have a low probability of occurring. It is given by

[math]\displaystyle{ P(x)=e^{-a}\frac{a^x}{x!} }[/math],

where [math]\displaystyle{ a }[/math] is the mean, and [math]\displaystyle{ e^{-a} }[/math] is the normalization coefficient so that the sum of P(x) for every non-negative integer x is 1. Notice that the Poisson distribution is only defined for non-negative integers, so it is not continuous. The standard deviation of the Poisson is

[math]\displaystyle{ \sigma=\sqrt{a} }[/math].

According the the method of finding maximum likelihood, the best fit of a Poisson to data is to take the mean of the data to be [math]\displaystyle{ a }[/math]. That is, the mean of the Poisson distribution that is the best fit is also the mean of the data.

Gaussian Distribution

When counting random events, the Gaussian distribution is often used when there is a high probability of a random event occurring. It is given by

[math]\displaystyle{ G(x)=\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{\left(x-a\right)^2}{2\sigma^2}} }[/math],

where [math]\displaystyle{ a }[/math] is the mean, [math]\displaystyle{ \sigma }[/math] is the standard deviation, and 1/√([math]\displaystyle{ 2\pi\sigma^2 }[/math]) is the normalization coefficient so that the integral over all [math]\displaystyle{ x }[/math] ([math]\displaystyle{ -\infty }[/math] to [math]\displaystyle{ \infty }[/math]) is 1. The Gaussian distribution is continuous, so it is called a "probability density function" (pdf).

According the the method of finding maximum likelihood, the best fit of a Gaussian to data is to take the mean of the data to be [math]\displaystyle{ a }[/math] and the standard deviation of the data to be [math]\displaystyle{ \sigma }[/math]. That is, the mean and standard deviation of the Gaussian distribution that is the best fit is also the mean and standard deviation of the data.

Equipment

photomultiplier tube (PMT) with NaI scintillator
coaxial cables with BNC connectors
about 40 lead bricks (needed to built a house for the PMT)
high voltage power supply for the PMT (1000 V should do the trick)
a means of acquiring data from the PMT so that frequency can be measured accurately (this is the trickiest part!)

I have given a somewhat-general equipment-needed list since the specific equipment I am using will not affect the result of counting the signals from the PMT.

As for how a acquired data from the PMT, we used an amplifier to amplify the signal, which we connected to some chip in a computer, which works with really shitty software to, after changing many settings, measure frequency.

Our setup

We plugged in the high voltage power supply to the power outlet and then to the PMT using coaxial cables.
Without using any radioactive source, we built a lead house around the PMT using the bricks (I think we did this to prevent local radioactive sources from altering the data, but I'm not sure).
We then used a really weird contraption that connected the power supply to an amplifier to many other stuff to power an amplifier which we connected, with coaxial cables, to the PMT and the computer chip.
We then changed just about every setting that exists on the software (the shitty graphical interface made this difficult) to allow it to measure frequency (counts during a predetermined unit of time) and to allow it to put this frequency DATUM into a "bin." The program will fill many of these bins over a long period of time before stopping.

Procedure

The procedure was wonderfully easy since it is automated by software, but it took some time.

The data software has what is called a "dwell time," which is the predetermined unit of time that is associated with one bin. For smaller dwell times, the frequency data in the bins should become smaller making the Poisson a better choice. To study how well the Poisson and Gaussian could fit the data depending on dwell time, we used a wide range of dwell times: 10ms, 100ms, 1s, 10s, and 100s.

For dwell times 10ms, 100ms, and 100s, we had 4096 bins. For dwell times 1s and 10s, we only had 256 bins since more bins would have taken more time than we had (we could do 4096 bins for the 100s dwell time since we let the experiment run over the weekend).

More bins is better since more bins will "smooth out the bumps" on the probability distribution from the data according the the law of large numbers (the formula for standard error of the mean also reveals that more bins gives smaller error).

Data and Results

In this section, I will give raw data plots and the calculations of mean and standard deviation of that data.

To denote the uncertainty of my mean, I will write the mean this way:

[math]\displaystyle{ a=a\pm SE }[/math]

where

[math]\displaystyle{ SE=standard\ error\ of\ the\ mean=\frac{\sigma}{\sqrt{number\ of\ bins}} }[/math].

10ms

[math]\displaystyle{ a=0.07666\pm0.00527\, }[/math]
[math]\displaystyle{ \sigma=0.3373\, }[/math]

100ms

[math]\displaystyle{ a=0.6775\pm0.0157\, }[/math]
[math]\displaystyle{ \sigma=1.006\, }[/math]

1s

[math]\displaystyle{ a=6.766\pm0.191\, }[/math]
[math]\displaystyle{ \sigma=3.063\, }[/math]

10s

[math]\displaystyle{ a=69.23\pm0.69\, }[/math]
[math]\displaystyle{ \sigma=11.07\, }[/math]

100s

[math]\displaystyle{ a=\pm\, }[/math]
[math]\displaystyle{ \sigma=\, }[/math]

Error

Systematic Error

The least squares linear fit to the 10s data provides a slope of -0.035. That is, each next frequency DATUM of the 256 data is expected to be -0.035 less than the previous on average, which means that there is a -8.96 decrease over all 256 data points. I speculate that since the earth could turn significantly in the 42.7 minutes that it took to obtain the 10s data, there must be some cosmic radiation that is being detected and that this radiation has decreased. This -8.96 decrease is significant compared to the overall increase over all 256 data points of the 1s data, which is the second largest: 0.336.

I would expect that this error would make the Poisson distribution a poorer fit since it has a fixed standard deviation, whereas the Gaussian distribution can change both its mean and standard deviation.

Random Error

Random error is the focus of this entire experiment. Without random error, there would be no distributions; we would always achieve the mean instead. However, the probability gods thought that the universe would be boring without random error, so they implemented it, and now I am studying it.

A source of random error is that I cannot use infinite bins since this would take too long. I will never be able to perfectly know how well the Poisson or Gaussian distributions fit data.

Fitting Gaussian and Poisson distributions to the data

In this section, I will plot these probability distributions: of the data, from the Poisson, and from the Gaussian. To find the distribution of the data, I simply look at the data and count the number of times a specific frequency and then normalize these counts by the number of bins in the original data. To fit the Poisson and Gaussian, I simply use the mean and standard deviation of the data and plug them into the distributions, which provides the best fit according the the method of maximum likelihood.

I will also calculate "Error," which is sort of like the standard deviation of the absolute error between my data's distribution and the Poisson or Gaussian.

[math]\displaystyle{ Error=\sqrt{\frac{1}{N}\sum_{x=0}^N\left(Distribution(x)-DataDistribution(x)\right)^2} }[/math],

where "Distribution(x)" is either the Poisson or Gaussian, and [math]\displaystyle{ N }[/math] is the max frequency of the data (the last [math]\displaystyle{ x }[/math] with a [math]\displaystyle{ y }[/math] value in the following graphs).

10ms

[math]\displaystyle{ Error_{Poisson}=0.0241\, }[/math]
[math]\displaystyle{ Error_{Gaussian}=0.1211\, }[/math]

100ms

[math]\displaystyle{ Error_{Poisson}=0.0615\, }[/math]
[math]\displaystyle{ Error_{Gaussian}=0.1186\, }[/math]

1s

[math]\displaystyle{ Error_{Poisson}=0.0179\, }[/math]
[math]\displaystyle{ Error_{Gaussian}=0.0124\, }[/math]

10s

[math]\displaystyle{ Error_{Poisson}=0.0066\, }[/math]
[math]\displaystyle{ Error_{Gaussian}=0.0058\, }[/math]

100ms

[math]\displaystyle{ Error_{Poisson}=\, }[/math]
[math]\displaystyle{ Error_{Gaussian}=\, }[/math]

Conclusion

blah blah blah about error

The Poisson distribution approaches the Gaussian distribution when 1) counting random events and when 2) frequency becomes large. The Poisson is therefor always the more accurate choice. I was wondering why the Gaussian distribution would ever be used for measuring random frequencies, but Dr. Koch and MATLAB helped me understand that the Poisson is difficult to use for high frequencies since it is not continuous and since the factorial provides some computational challenges.

@@ Line 117: / Line 117: @@
 *<math>Error_{Poisson}=0.0241\,</math>
 *<math>Error_{Gaussian}=0.1211\,</math>
-[[Image:Dist10ms.jpg|left|none|400px]]
+[[Image:Dist10ms.jpg|left|none|500px]]
 ===100ms===
 *<math>Error_{Poisson}=0.0615\,</math>
 *<math>Error_{Gaussian}=0.1186\,</math>
-[[Image:Dist100ms.jpg|left|none|400px]]
+[[Image:Dist100ms.jpg|left|none|500px]]
 ===1s===
 *<math>Error_{Poisson}=0.0179\,</math>
 *<math>Error_{Gaussian}=0.0124\,</math>
-[[Image:Dist1s.jpg|left|none|400px]]
+[[Image:Dist1s.jpg|left|none|500px]]
 ===10s===
 *<math>Error_{Poisson}=0.0066\,</math>
 *<math>Error_{Gaussian}=0.0058\,</math>
-[[Image:Dist10s.jpg|left|none|400px]]
+[[Image:Dist10s.jpg|left|none|500px]]
 ===100ms===
 *<math>Error_{Poisson}=\,</math>
 *<math>Error_{Gaussian}=\,</math>
-[[Image:Dist100s.jpg|left|none|400px]]
+[[Image:Dist100s.jpg|left|none|500px]]
 ==Conclusion==

Physics307L:People/Knockel/Notebook/071010: Difference between revisions

Revision as of 12:42, 20 October 2007

Contents

Poisson Statistics

Goal

Theory

Poisson Distribution

Gaussian Distribution

Equipment

Our setup

Procedure

Data and Results

10ms

100ms

1s

10s

100s

Error

Systematic Error

Random Error

Fitting Gaussian and Poisson distributions to the data

10ms

100ms

1s

10s

100ms

Conclusion

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

research

Tools