# User:Paul V Klimov/Notebook/JuniorLab307L/2008/11/24

Project name Main project page
Previous entry
SJK Incomplete Feedback Notice
Incomplete Feedback Notice
My feedback is incomplete on this page for two reasons. First, the value of the feedback to the students is low, given that the course is over. Second, I'm running out of time to finish grading!

# Poisson Statistics

Data by Paul Klimov and Garrett McMath

## Introduction

SJK 17:42, 17 December 2008 (EST)
17:42, 17 December 2008 (EST)
you can also derive the Poisson distribution from an exponentially distributed PDF for event time (e.g. exponentially decaying probability) and the integrating expectation values. I did this once 10 years ago I should try it again but now I am old.

Poisson statistics are crucial in our understanding of many phenomena. The famous distribution gives us the probability that a number of events will happen given that the event of interest is happening at a fixed average rate. The Poisson distribution is derived directly from the binomial distribution. The binomial distribution tells us the probability of k successes after n trials, each of which occurs with a probability p. The distribution is given as follows :

$Pr(k,n,p) = {n\choose k}p^k(1-p)^{n-k}, {n\choose k}=\frac{n!}{k!(n-k)!}$

The expected value of successes is given by the product of the number of trials and the probability of success. I will denote this quantity by the parameter lambda, which will become convenient, as we will see later:

$\lambda = n\cdot p$

Using the new parameter, the binomial distribution can be re-written as follows:

$Pr(k,n,N) = \frac{n!}{k!(n-k)!}(\frac{\lambda}{n})^{k}(1-\frac{\lambda}{n})^{n-k}$

In the limit of small probability and many trials (low p and large n), the above distribution approaches the following distribution:

$Pr(k,\lambda) =\frac{\lambda^{k}e^{-\lambda}}{k!}$

This is the Poisson distribution. To learn more about the distribution, we must discuss several important statistical quantities. The mean can be computed as follows, returning:

$\bar{k}=\sum k \frac{\lambda^{k}e^{-\lambda}}{k!} =\lambda$

The variance is then computed as follows (the sums sum over all values of k starting with 0):

$\sigma^{2} = (\sum k \frac{\lambda^{k}e^{-\lambda}}{k!})^{2}-(\sum k^{2} \frac{\lambda^{k}e^{-\lambda}}{k!})= \lambda$

Clearly the poisson distribution has a very interesting feature, that its mean and its variance are the same value -- the expectation parameter lambda. I will be exploiting this fact throughout this lab, and my data analysis will be based directly on this feature. Another important mention is that the Poisson distribution merges with the normal distribution (or gaussian) for large expectation values (lambda large). The famous gaussian distribution is given by the following formula,where sigma is the standard deviation and mu is the mean. :

$Pr(\sigma,\mu)=\frac{1}{\sigma \sqrt{2\pi}}e^{\frac{-(x-\mu)^{2}}{2\sigma^{2}}}$

In this lab we will be will be studying the rate of cosmic ray bombardment, a phenomenon which should be described by the Poisson distribution. The rays will be detected with a NaI scintillation counter. As rays strike the detector, they ionize a compound which becomes fluorescent upon ionization. The fluorescence is then picked up with a photomultiplier tube (PMT), which amplifies the signal, and gives us a reading. The reading will be interpreted by a computer operator.

## Equipment

• Gateway E-4200
• MCA card
• Harshaw NaI(TI) Integral Line Scintillation Detector 1231
• E&G Ortec 4001C NIM BIN
• Hydra cable
• BNC cables
• Bertan Associates, Inc. Model 313B HV Power Supply
• Harshaw NaI(TI) Integral Line Scintillation Detector 12312

## Procedure

Everything was wired according to Professor Gold's manual. Although the wiring looked a bit complicated, it amounted to us hooking up the NaI counter to a computer, where our data could be interpreted by a computer operator. All settings from there were adjusted on the computer operator. The software, although a bit outdated, actually worked quite well after we figured out how to use it.

The first important setting on the operator is the channel setting. Each channel (also bin) acts as a separate detector, in a sense. After the operator starts acquiring data from the detector, it starts counting the number of events into the first bin only. After a certain dwell time has elapsed, the operator switches to the next bin, and starts counting for the same length of time. The process continues until the operator has reached the last bin, at which point it can either repeat the process or stop taking data. For each trial, we had the operator set to use 256 channels. Dwell times were varied, as were the number of passes.

## Data

Due to the huge data outputs, I will have to link to the data. Although all original files were in ASCII format, everything was converted to excel. During the first few trials we were getting used to the software and so we did not take down the dwell times, or number of passes, unfortunately. Because we have a lack of information about those trials, I will not use them for any large conclusions.

## Possible Sources of Error

• Since we are detecting cosmic rays, we must worry about the motion of the earth. I think that this could cause some troubles in longer experiments where the rate of cosmic bombardment could change over the course of the experiment, for whatever reason. Although I do not know for sure, I think it would be reasonable for there to be more 'active' bombardment periods in each day - more specifically, as the earth rotates there could be changes in the rate of bombardment. However, given that our experiments were all taken over shorter time periods, I do not expect this to be a problem.

## Post Experimental Analysis

Figure 1: Day 1. The dwell times, passes, means (denoted by lambda) and variances are given in the titles of the figures. The theoretical poissonians and gaussians are plotted alongside the real data, which is presented in the form of a histogram.
Figure 2: Day 2. The dwell times, passes, means (denoted by lambda) and variances are given in the titles of the figures. The theoretical poissonians and gaussians are plotted alongside the real data, which is presented in the form of a histogram.
Figure 3: The three plots show the percent deviations of our data from the poisson as functions of the dwell time, the number of passes, and the product of the two. There are clear trends in the first two plots and a perhaps more subtle one in the final plot. This data seems to suggest that our data fits the poisson the best for lower dwell times and higher numbers of passes.
Figure 4: Stem plot of the percent deviation as functions of both the dwell times and the number of passes. Clearly the combination of small dwell times and high number of passes produced distributions that are in close agreement with the poisson distribution.
Figure 5: Deviation from the poissonian as a function of the mean (parameter lambda). There is clearly a downard trend suggesting that the higher the mean, the better the fit is to the theoretical poissonian.

To analyze our data, I have decided to compare the means and variances of our 'event' distributions, which are the same in the case of the poisson distribution. In addition I will be computing the magnitude of deviation of the variance from the mean, by finding the relative error between these two quantities. This deviation will make it easier for us to see how well our data is fitting the poissonian.

I have produced plots of our data along with the theoretical poissonians and gaussians, which can be seen in Figure 1 and Figure 2. Each theoretical gaussian was produced with the standard deviation and mean of the respective data. Likewise, each theoretical poissonian was produced from the mean of each respective data. Both theoretical distributions were normalized to the total number of counts. Therefore, each distribution will sum to the total number of counts for the given set of data. In addition, I produced semi-natural-logarithmic plots of the deviations versus the dwell times and the number of passes (see Figure 3). semi-logarithmic plots were used in order to "slow down" relative growth rates of each variable -- both the dwell times and number of passes spanned a broad spectrum of magnitudes. Therefore plotting in such a way will allow us to have a detailed picture of the data for smaller values.

As we can see in the figures, some of the theoretical plots have 'sharp' edges. These edges are the result of mapping from a discrete natural number: (Pr: Z->R). However, I could have used a continuous variable as well, with some slight modifications. While the gaussian, in the form presented in the introduction, will map from the reals to the reals (G: R->R), the Poisson will not. The problem arises because it is not immediately obvious how to take the factorial of a real number that is not 'whole'. To bypass this, we must invoke the gamma function, which provides us with a method of finding the factorial of any real number:

$\Gamma(n) = \int_0^\infty t^{n-1} e^{-t}\,dt$
$t,n \in \mathbb{R}$

The factorial can be expressed like so using the gamma function:

$n! = \Gamma(n+1)\cdot$

Therefore, the poisson, over a continuous variable would presumably look like this:

$Pr(k,\lambda) =\frac{\lambda^{k}e^{-\lambda}}{\Gamma(k+1)}$
$k,\lambda \in \mathbb{R}$

Of course this is just a 'smoothing tool', because non-integer k makes absolutely no sense in a physical system because you cannot have a non-integer number of events. However, this will produce nicer plots!

Note: Unfortunately I did not end up using this because it made accurate normalization of our data unnecessarily hard. This would require me to fit our data to a smooth function, which would introduce inaccuracies in the scaling of each distribution. But hopefully the plots don't look horrible.

Trial Dwell Passes Mean(λ) Variance2) Deviation

(100 | λ − σ2 | / λ)

1 ? ? 4.099 4.506 9.9
2 ? ? .2908 .2391 18
3 ? ? 1.116 1.367 23
4 10ms 120 4.394 5.056 15
5 400ms 1 1.422 0.973 32
6 8ms 300 8.761 8.439 3.7
7 200ms 20 13.06 8.565 34
8 100ms 20 6.733 5.700 15
9 1s 1 3.462 1.666 52
10 400us 1800 2.609 3.215 23
11 1ms 600 2.223 2.102 5.4
12 1ms 1200 4.514 4.419 2.1
13 1ms 2400 9.032 8.439 6.6
14 1ms 4800 18.23 18.39 0.85
15 200ms 3 2.331 1.774 24
SJK 17:48, 17 December 2008 (EST)
17:48, 17 December 2008 (EST)
I notice that the 400 microsecond point is actually a higher variance than expected, and thus your line would cross through the x-axis. Therefore, I think a better criterion would be the fractional difference without the absolute value.

## Discussion

Clearly our method of data gathering (ie settings on the operator) had a huge impact on our results. Therefore, it will be beneficial for us to look at the deviations from the Poisson distribution as functions of dwell times and the number of passes. This will help us gain a better understanding of our apparatus, and how it might affect our final data.

Initially, we believed that the dwell-pass (d-p) product would determine the magnitude of deviation from the poissonian. This seemed reasonable to me because it is this product that determines the total time elapsed on each bin. However, after observing the third plot in Figure 3, I am forced to re-evaluate these initial beliefs. While the last plot has a downward trend, we cannot be sure how accurately the trend represents the actual system. I believe so because we have fewer data points with large d-p products. This could skew the data in favor of the first points, which have overwhelming density. And in fact this point is underemphasized with logarithmic plots because the relative resolution of small and large increments is exaggerated.

Let us now examine the deviation as a function of the dwell time (second plot in Figure 3). In this figure we see a striking upward trend that spans deviations from only a few percent up to around 40 percent. The most interesting feature in this plot is the bundle of 4 points at the left of the plot (all of which correspond to 1ms dwell time). At first glance, this accumilation of points suggest that dwell times have a large impact on our data. And given that this set of points spans from 600 passes to 4800 passes, they seem to reaffirm that the d-p product might have little effect on the outcome of the experiment.

Likewise we should evaluate the second plot in Figure 3 which plots the number of passes versus deviation. This time there is a downward trend which suggests that the larger the number of passes the less the deviation from the Poisson distribution. This is expected because as the number of passes increases, each bin has a larger effective active time. This, in turn, should make the data less susceptible to 'freak-occurrences'. Furthermore, the number of passes has a very noticeable effect on the outcome of the experiment also.SJK 17:51, 17 December 2008 (EST)
17:51, 17 December 2008 (EST)
I agree with you here: I think the number of passes effect is just the simple effect you note (more precision with more data)

To combine the above discussion it might be useful to look at the deviations as a function of the dwell time and the number of passes simultaneously. I have provided this in Figure 4, in the form of a stem plot, where the length of each stem is the deviation from the poissonian. Clearly, the data which represents the poissonian comes about from a very narrow range of settings: short dwell times and many passes. And while there are some un-explored' corners in the figure, there is a very clear trend evident already.

Lastly, we should discuss deviations from the poisson distribution as functions of the mean value (or expectation parameter lambda), which is plotted in Figure 5. Clearly there is a downward trend, suggesting that for larger means our distributions approach the theoretical poissonian. This is quite interesting, because it seems to imply that this happens, on average, irrelevant of the operator settings. There also appear to be 2 outliers in this plot. To determine whether they truly are, we should invoke Chauvenet's theorem.

NOTE: I will be skeptical of the results, no matter what they come out to be, mainly because of the trend in the data. I suspect that there is a better way to look for outliers when a trend is present, because it seems to me that the mean and the standard deviation are blurry quantities in this context. However, using the theorem is probably an useful exercise anyways. I have noted 'NOT TO BE TAKEN AS CONCLUSIONS' because this data cannot summarize everything, and it is not to be taken as concluding data of any sort.

• Deviations from Poisson Data (NOT TO BE TAKEN AS CONCLUSIONS):
Mean Deviation: 17.5%
Standard Deviation of the Deviation: 14.2%
• Suspect Outlier 1:
Lambda: 17.5
Poisson Deviation: 52.7
Standard Deviations Away: ~2.5
Probability of Obtaining Pt. : ~.01
(0.01)(15)<0.5
Conclusion: Point is an outlier according to Chauvenet's theorem.
• Suspect Outlier 2:
Lambda: 13.1
Poisson Deviation: 34.5
Standard Deviations Away: ~1.2
Probability of Obtaining Pt. : ~.3
(0.3)(15)>>0.5
Conclusion: Point is not an outlier according to Chauvenet's theorem.

Based on this analysis, only one of the points can be classified as an outlier. However, as I stated earlier, I am skeptical of the Chauvenet approach here because of the trend -- in fact, it seems that the point that cannot be classified as an outlier is effectively further from the trend than the point that can be classified as an outlier. Regardless of these two potential outliers, the overwhelming density that forms the trend is clear, and the implications are important.

SJK 17:53, 17 December 2008 (EST)
17:53, 17 December 2008 (EST)
I too am skeptical, but I think you did perform the exercise correctly and it's a good learning exercise.

## Some Conclusions

From the above discussion, we see that our experimental setup played a huge role in our data, which opens up lots of room for interpretation. Furthermore, I believe there is no single distribution that could explain our results perfectly. In the end, I would argue that the normal distribution would be a better overall fit, because it has two variable parameters, as compared to 1 with the poisson distribution. And I think Figure 5 upholds this claim, as we see that deviations from the poisson distributions become smaller (with the exception of at least 1 blatant outlier) as the mean increases, which is expected of a gaussian. However, there are also aspects of our data that cannot be fit well with a gaussian, in particular the distributions with smaller means. Even qualitatively, as we can see from some trials in Figure 1 and 2 that the gaussian becomes a rather poor fit when the mean is low.

However, at this point it might be reasonable to argue that at low means we have taken too little data to to resolve a distribution well enough to identify it. The justification for this argument is that a low mean is likely the result of too few passes, which would open the possibility for unlikely occurrences heavily effecting our data.

## Improvements for Future

SJK 17:55, 17 December 2008 (EST)
17:55, 17 December 2008 (EST)
A conclusion I didn't expect! I think we also really need a radioactive source, so we can be sure of a Poisson process that we can study. The investigation you two created was very well done, and I learned a lot.

I really think this lab could use a third class period, even though I did not realize this during the lab. With too little data one could be easily misled in this lab since the operator settings have a huge impact on the results. Therefore, I think one would really have to get a huge amount of data to make some strong conclusions, which I don't believe would be possible in only 6 hours.

## MATLAB code

NOTE: To run the code, you must put ALL of the excel files presented in the data section into the same folder as the matlab code. In addition, you must have MATLAB 'Current Directory' set to this folder. The program is coded to automatically execute and read the excel files for quick data analysis!