# Physics307L F08:People/Joseph/Notebook/071017

## Contents

# Poisson Statistics

Experimentalists: Nikolai Joseph and Bradley Knockel

^{SJK 00:51, 19 November 2007 (CST)}## Objective

By taking data that generates seemingly random data sets we hope to show that under certain circumstances the data fits a Poisson distribution. Also, a fair analysis of how the data is distributed in non-Poisson situations is in order. We are taking data that we believe to be of cosmic origin, over various apertures of time; for instance, 256 bins of 2 seconds each. The large amount of incidents being recorded over varying sized bins will give us a large variety of distributions.

## Theory

When collecting large amounts of data it is wise to look at the probability distributions for that data. From the **binomial distribution** we can derive the Gaussian and Poisson distributions.

### The Binomial Distribution

When analyzing any randomly distributed situation a binomial distribution:

**Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle B(x)=\frac{N!}{x!(N-x)!}p^nq^{N-n}}**

with a standard deviation of

**Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \sigma=\sqrt{pN(1-p)}}**

and a mean of

**Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle a=pN}**

is used. With **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle N}**
= the number of counts, **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle p}**
= the probability of counts occurring, and **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle q}**
= the probability of counts not occurring. In all instances **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle p+q=1}**
, since something either happens or it doesn't, **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle p}**
and **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle q}**
must sum to 1. In context of our experiment, we have a very large **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle N}**
with a very small **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle p}**
. Undergoing several manipulations we can approximate the binomial distribution to be the Poisson distribution. More information can be found here

### The Gaussian Distribution

When analyzing a situation in which there is a high probability of occurrence (large

**Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle G(x)=\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{\left(x-a\right)^2}{2\sigma^2}}}**
,

with **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle a}**
= the mean, **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \sigma}**
= the standard deviation.
The Gaussian distribution is often used to model probabilities and is useful because if the standard deviation and mean are optimal then the actual mean and standard deviation values will match those given theoretically. A very good tool for understanding the Gaussian distribution can be found here

### The Poisson Distribution

When analyzing a random situation in which there is a very low probability of occurrence (large **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle N}**
and small

**Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle P(x)=e^{-a}\frac{a^x}{x!}}**

with a standard deviation of

**Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle \sigma=\sqrt{a}}**
,

with **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle a}**
= the mean. The Poisson distribution appears only around zero and, unlike the Gaussian or binomial distributions, can only reflect positive integers. One can imagine a Gaussian distribution that has been normalized so it can only be positive values with a mean greater than zero (but not too much greater). A good tool for getting comfortable with Poisson distributions can be found here

## Experiment

### Setup and Equipment

We have a setup that consists of a photomultiplier tube that is attached to a NaI scintillator, both are housed in a structure of lead bricks. The arrangement is wired to a high voltage power supply (1000 volts) and then run through some sort of bridge and to a data acquisition board on a computer. The computer is running a program called PCAIII, which handles the data acquisition process. The photomultiplier tube and scintillator were connected by way of coaxial cables to the power supply which we connected to the bridge, and from the bridge into the computer. There were some erroneous cables coming from the data acquisition board that we had no need to mess with.

### Procedure

Once we had all cables secured and the power supply and bridge were warmed up, we were able to start taking data. Using PCAIII we simply configured how many bins of data we were taking and how much time each bin would get ("dwell time"). We varied our bins from 256 (for 1s, 2s, and 10s) and 4096 (for 10ms, 100ms, and 100s). We experimented with the bins and determined the number of bins to be a 'resolution' of sorts. The more bins, the smoother the data became. We experimented and determined that we'd use a high number in some situations and a low number in others. It was mostly preference.

### Data and Discussion

I couldn't figure out a good way to show uncertainty in the mean so I borrow from Bradley the idea of doing it thusly:

where

We concluded that the uncertainty was not as much relevent to the data as it is to the **process**, and therefore should be considered. Since this data was gathered over several days, and from a completely uncontrollable source, there will be some fluctuation that is not necessarily error.

The data as follows.

- 10 ms

- 100 ms

- 1 second

- 2 seconds

- 10 seconds

- 100 seconds

Notice the odd spike.

### Error

This lab is really based on the notion of random error. The whole objective is to record events and to notice that they are distributed randomly. The source(s) of our data we presume to be of cosmic origin, but not of any specific origin that we can identify. We can't say, with any measure of confidence, what is exactly producing our data! Look at the graph for the 100 seconds data and notice the large, sharp spike. That is entirely inexplicable. Using our lead shielding and minimizing disturbances is really the best that can be done to control this experiment. With Bradley's help I was able to plot the Gaussian and Poisson distributions against our data and you can see that even though we have a good system for making predictions, the data does not always fit. On the longer time scales the data fits fine, but that isn't so much of what we are concerned about. Beside each distribution I give the error from the data in terms of either Gaussian or Poisson.

- 10 ms

- 100 ms

- 1 second

- 2 second

- 10 second
**Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle Error_{Gaussian}=0.0028}**

- 100 second
**Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle Error_{Poisson}=not real number!}****Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://api.formulasearchengine.com/v1/":): {\displaystyle Error_{Gaussian}=0.002}**

Notice how quickly the Gaussian error drops off after 100 ms and also notice how the Poisson error is highest on the most Poisson looking distribution!

## Conclusions

In situations where a Poisson distribution fits the data well, look for a low standard deviation. As the number of successful trials climbs, the Poisson won't be as useful and the Gaussian takes over. Unless it is your goal and you have a question to answer with it, avoid taking data from space.