Proofs of the Gamma Distribution
Intro
Within this post, I’ll explore the properties of the Gamma distribution. The results presented here are interesting as they ripple throughout mathematical statistics. Each result has a proof associated with it in hopes of better how the result came to be. Below is a preview of the posts contents.
 Gamma in real life
 Parameterizations of Gamma
 Definition 1: $X \sim Gamma\left({\alpha,\beta}\right), \quad f\left( x \right) = \frac{1}{ {\Gamma \left( \alpha \right){\theta ^\alpha } } }{x^{\alpha  1} }\exp \left( {  \frac{x}{\theta } } \right), \, x > 0$
 Definition 2: $X \sim Gamma\left({\alpha,\frac{1}{\lambda} }\right), \quad f\left( x \right) = \frac{ { {\lambda ^\alpha } } }{ {\Gamma \left( \alpha \right)} }{x^{\alpha  1} }\exp \left( {  \lambda x} \right), \, x > 0$
 Derivation of Gamma Distribution
 Examples with Gamma
 Integrating the Gamma PDF
 Exploiting the GammaPoisson Relationship
 Gamma as a means to an end for integration by parts.
 Gamma Function
 Proposition 1: $\Gamma \left( 1 \right) = 1$
 Proposition 2: For any $\alpha >0$, we have: $\Gamma \left( {\alpha + 1} \right) = \alpha \Gamma \left( \alpha \right)$.
 Proposition 3: For any integer $\alpha$, we have: $\Gamma \left( {\alpha + 1} \right) = \alpha !$
 (Bonus) Proposition 4: $\Gamma \left( {\frac{1}{2} } \right) = \sqrt \pi$.
 Lemma 1: Guassian Integral $I = \int\limits_0^\infty {\exp \left( {  {x^2} } \right)dx} = \frac{ {\sqrt \pi } }{2}$
 Verifying the Gamma PDF
 Theorem 1: If $X \sim Gamma \left({\alpha,\theta}\right)$ with $\alpha >0$ and $\theta >0$, then $\int\limits_0^\infty {\frac{1}{ {\Gamma \left( \alpha \right){\theta ^\alpha } } }{x^{\alpha  1} }\exp \left( {  \frac{x}{\theta } } \right)dx} = 1$
 Gamma Cumulative Distribution Function (CDF): $P\left( {X \le x} \right) = \frac{ {\gamma \left( {\alpha ,\frac{x}{\theta } } \right)} }{ {\Gamma \left( \alpha \right)} }$
 Exploring the Moments
 Theorem 2: The $k$th moment of a Gamma distribution is $\left( {\alpha + k  1} \right)\left( {\alpha + k  2} \right) \cdots \alpha {\theta ^k}$
 Theorem 3: The Moment Generating Function of a Gamma distribution is $\frac{1}{ { { {\left( {1  \theta t} \right)}^\alpha } } }$ if $t < \frac{1}{\theta}$.
 Theorem 4: The Characteristic Function of a Gamma distribution is $\frac{1}{ { { {\left( {1  it\theta } \right)}^\alpha } } }$.
 Gamma Distribution Properties
 Scaling Property: $cX \sim Gamma\left({\alpha, c\theta}\right)$
 Mean: $\alpha \theta$
 Variance: $\alpha {\theta ^2}$
 Skewness: $\frac{2}{ {\sqrt \alpha } }$
 Kurtosis: $3 + \frac{6}{\alpha }$
 Excess Kurtosis: $\frac{6}{\alpha }$
 Estimators
 Maximum Likelihood Estimation: ${ {\hat \theta }_{MLE} } = \frac{ {\bar x} }{\alpha }$, no closed form for $\alpha$.
 Method of Moments Estimation: $\tilde \alpha = \frac{ {n{ {\bar X}^2} } }{ {\sum\limits_{i = 1}^n { { {\left( { {X_i}  \bar X} \right)}^2} } } },\tilde \theta = \frac{ {\sum\limits_{i = 1}^n { { {\left( { {X_i}  \bar X} \right)}^2} } } }{ {n\bar X} }$
 Lemma 2: $\sum\limits_{i = 1}^n {X_i^2}  n{ {\bar X}^2} = \sum\limits_{i = 1}^n { { {\left( { {X_i}  \bar X} \right)}^2} }$
 Related distributions
 If $X \sim Gamma\left({1, \frac{1}{\lambda } }\right)$, then $X$ has an exponential distribution with rate parameter $\lambda$.
 If $X \sim Gamma\left({\frac{\nu}{2}, 2}\right)$, then $X$ is a chisquared distribution with $\nu$ degrees of freedom, $\chi ^2\left({\nu}\right)$.
 If $X_i \sim Gamma\left({\alpha_i, \theta }\right)$, then $\sum\limits_{i = 1}^n { {X_i} } \mathop \sim \limits^{iid} Gamma\left({ \sum\limits_{i = 1}^n { {\alpha_i} } , \theta }\right)$.
 If $X_i \sim Exponential\left({ {\lambda } }\right)$, then $\sum\limits_{i = 1}^n { {X_i} } \mathop \sim \limits^{iid} Gamma\left({n, \frac{1}{\lambda } }\right)$.
 Gamma in R
 Misc
 Misc 1: $0 = \sum\limits_{i = 1}^n {\left( { {X_i}  \bar X} \right)}$
Gamma in real life
Examples of the Gamma random variable are:
 The time it takes for an event to occur, e.g. waiting to check out or for assistance.
 Accumulation of a quantity in a given time period, e.g. the amount of goals scored or accidents in a work place.
Parameterizations of Gamma
We say that a random variable $X$ has a Gamma distribution with parameters $\alpha > 0$ and $\theta > 0$ if its probability density function has the form:
Definition 1: $X \sim Gamma\left({\alpha,\beta}\right), \quad f\left( x \right) = \frac{1}{ {\Gamma \left( \alpha \right){\theta ^\alpha } } }{x^{\alpha  1} }\exp \left( {  \frac{x}{\theta } } \right), \, x > 0$
Alternatively, we can parameterize Gamma in a different way by using $\theta = \frac{1}{\lambda}$ giving the probability density function the form:
Definition 2: $X \sim Gamma\left({\alpha,\frac{1}{\lambda} }\right), \quad f\left( x \right) = \frac{ { {\lambda ^\alpha } } }{ {\Gamma \left( \alpha \right)} }{x^{\alpha  1} }\exp \left( {  \lambda x} \right), \, x > 0$
Either of these parameterizations is okay!
We often refer to $\alpha$ as the shape parameter and $\theta$ as the scale parameter. Note $\alpha$ and $\theta$ are not necessarily integers.
Note: $\Gamma \left( \alpha \right)$ is the Gamma function!
Derivation of Gamma Distribution
Suppose $X \sim Poisson\left(\lambda\right)$. That is, events happen within a poisson process at rate $\lambda$ for a unit of time. So,
Now, let $T$ represent the length of time until $\alpha$ event. Since $T$ is representing time, we are working with a continuous random variable. So, the cumulative distribution function (CDF) that models the probability that there are at least $\alpha$ events in the time $t$ is given by:
Note $P\left( {T > t} \right)$ is the probability of fewer than $\alpha$ changes in the interval $\left[0, t\right]$. To find $T$ we need to use the Poisson process exhibited by $X$ with an extension to time.
When speaking about an extension to time, remember that $X$ was only acting on a unit of time. However, to support $T$, $X$ must act on the interval $\left[0, t\right]$. Using the fact that the events are Poisson and independent of each other, the rate is updated to be: $\lambda + \lambda + \cdots + \lambda = \lambda t$. So, the poisson process is now $X\sim Poisson\left({\lambda t}\right)$. Thus,
With this knowledge, the connection of the CDF to the poisson process is possible:
In a nutshell, the above statement represents equality of probabilities between observing the $\alpha$th event after time $t$ and observing less than $\alpha$ events from now until time $t$.
Taking the derivative with respect to $t$ yields the probability density function of $T$:
Simplifying the series gives:
Returning:
Note: $\Gamma \left( \alpha \right) = \left( {\alpha  1} \right)!$ is the Gamma function for integers.
Why are we using integers? Well, $\alpha$ is defined to be the number of events. Since poisson is a discrete random variable, the events must be discrete (i.e. integers). See the Gamma function section for more information!
Now, if we set $\lambda = \frac{1}{\theta }$, then we receive the first parameterization of the Gamma Distribution:
Examples with Gamma
Integrating the Gamma PDF
Students ask questions in class according to Poisson process with the average rate of one question per 20 minutes. Find the probability that the third question is asked during the last 10 minutes of a 50minute class.
Using only the Gamma Distribution’s integral definition we have:
Exploiting the GammaPoisson Relationship
Within the derivation section of Gamma, the starting process used to create the gamma distribution was that of a poisson process. From the poisson process, we are able to obtain the following shortcuts that enable us to avoid calculating out the gamma distribution via integration by parts.
Therefore:
Gamma as a means to an end for integration by parts.
When doing integration by parts, if the integral is set up as follows: We are able to coerce the integral to be one using the definition of the gamma distribution, while in turn solving it!
General method for solving the integral:
To see the verification of the Gamma distribution integrating to 1, see Theorem 1. Here’s an example of using the integral against itself:
Gamma Function
Before we begin, note that there is a function called the Gamma function, which is defined as:
Definition 3:
From here we have:
Proposition 1: $\Gamma \left( 1 \right) = 1$
Proof:
Proposition 2: For any $\alpha >0$, we have: $\Gamma \left( {\alpha + 1} \right) = \alpha \Gamma \left( \alpha \right)$.
Proof:
Proposition 3: For any integer $\alpha$, we have: $\Gamma \left( {\alpha + 1} \right) = \alpha !$
Using the results from 1. and 2. in addition to the assumption that $\alpha$ is an integer we have:
, which is the factorial function.
Proof:
(Bonus) Proposition 4: $\Gamma \left( {\frac{1}{2} } \right) = \sqrt \pi$.
For dealing with halfintegers, the Gamma function is able to have the same recusive process applied. In this case, the base value for the Gamma function is given by: $\Gamma \left( {\frac{1}{2} } \right) = \sqrt \pi$.
Before we delve into the proof, we will need to prove a lemma regarding the Guassian Integral.
Lemma 1: Guassian Integral $I = \int\limits_0^\infty {\exp \left( {  {x^2} } \right)dx} = \frac{ {\sqrt \pi } }{2}$
Using polar coordinates we have:
Proof for (Bonus) Proposition 4:
Verifying the Gamma PDF
Theorem 1: If $X \sim Gamma \left({\alpha,\theta}\right)$ with $\alpha >0$ and $\theta >0$, then $\int\limits_0^\infty {\frac{1}{ {\Gamma \left( \alpha \right){\theta ^\alpha } } }{x^{\alpha  1} }\exp \left( {  \frac{x}{\theta } } \right)dx} = 1$
Proof:
Theorem 1 is very powerful for using Gamma integrals against themselves as you will see…
Gamma Cumulative Distribution Function (CDF): $P\left( {X \le x} \right) = \frac{ {\gamma \left( {\alpha ,\frac{x}{\theta } } \right)} }{ {\Gamma \left( \alpha \right)} }$
Exploring the Moments
Theorem 2: The $k$th moment of a Gamma distribution is $\left( {\alpha + k  1} \right)\left( {\alpha + k  2} \right) \cdots \alpha {\theta ^k}$
Theorem 3: The Moment Generating Function of a Gamma distribution is $\frac{1}{ { { {\left( {1  \theta t} \right)}^\alpha } } }$ if $t < \frac{1}{\theta}$.
The restriction on $t$ is avoids negative values and zero, which are not supported by the Gamma function.
Theorem 4: The Characteristic Function of a Gamma distribution is $\frac{1}{ { { {\left( {1  it\theta } \right)}^\alpha } } }$.
Gamma Distribution Properties
Scaling Property: $cX \sim Gamma\left({\alpha, c\theta}\right)$
Consider the standard Gamma Distribution, $X\sim Gamma\left( {\alpha ,\theta } \right)$,
We are looking to perform a transformation such that: $Y = g\left({X}\right) = cX$. To apply the transformation formula:
We need to find $X = g^{1}\left({Y}\right)$ and the Jacobian.
Therefore,
Mean: $\alpha \theta$
Using $k$th moment given by Theorem 2 we have the mean as:
Alternatively,
Using the moment generating function given by Theorem 3 we have the mean as:
Variance: $\alpha {\theta ^2}$
Using $k$th moment given by Theorem 2 we have the second moment as as: and using the result from the calculation of the mean from the $k$th moment we have:
Alternatively,
Using the moment generating function given by Theorem 3 we have:
and using the result from the calculation of the mean from the moment generating function we have:
Skewness: $\frac{2}{ {\sqrt \alpha } }$
To obtain a distribution’s skewness, we need to find the third standardized moment:
The moment can be rewritten as:
Focusing on the centeralized third moment given by:
Using Theorem 2, we obtain the $k$th moments for 1, 2, and 3:
Therefore,
Note, the variance of a gamma random variable is given by $Var\left( X \right) = \alpha {\theta ^2}$. Therefore, the standard deviation for gamma is: $SD\left( X \right) = \theta \sqrt{\alpha}$
Returning:
Kurtosis: $3 + \frac{6}{\alpha }$
To obtain a distribution’s Kurtosis:
First, we obtain focus on the centeralized fourth moment:
Using Theorem 2, we obtain the $k$th moments for 1, 2, 3, and 4:
Therefore,
Note, the variance of a gamma random variable is given by $Var\left( X \right) = \alpha {\theta ^2}$.
Returning,
Excess Kurtosis: $\frac{6}{\alpha }$
Excess Kurtosis is a slight modification of Kurtosis.
The notable difference between Kurtosis and Excess Kurtosis is the later has “minus 3” appended at the end of its formula. The correction is often attributed to the fact that the kurtosis of a Normal Distribution is 3. Thus, to make the Kurtosis of the Normal equal to zero, 3 is subtracted.
Hence, the Gamma Distribution’s Excess Kurtosis is:
Estimators
Maximum Likelihood Estimation: ${ {\hat \theta }_{MLE} } = \frac{ {\bar x} }{\alpha }$, no closed form for $\alpha$.

Obtain the likelihood function:

Obtain the log likelihood function:

Find the maximum value for $\theta$:

Substitute in maximum value for $\theta$ to likelihood function:
Note:

Find the maximum value for $\alpha$.
There is no closedform solution for maximizing $\alpha$. =(
Method of Moments Estimation: $\tilde \alpha = \frac{ {n{ {\bar X}^2} } }{ {\sum\limits_{i = 1}^n { { {\left( { {X_i}  \bar X} \right)}^2} } } },\tilde \theta = \frac{ {\sum\limits_{i = 1}^n { { {\left( { {X_i}  \bar X} \right)}^2} } } }{ {n\bar X} }$
First, obtain the 1st and 2nd theoretical moment using Theorem 2 to obtain:
Secondly, obtain the 1st and 2nd sample moments:
Thirdly, equate the theoretical moments with their respective sample moment:
Fourth, solve for the most straight forward parameter. In this case, solve for $\theta$.
Fiveth, substitute the parameter into the second equation and solve:
Sixth, return to the initial parameter and remove any unknown parameters:
Note the other form of: $\tilde \theta = \frac{ {\frac{1}{n}\sum\limits_{i = 1}^n {X_i^2}  { {\bar X}^2} } }{ {\bar X} }$ is okay. However, the preferred form is that of the difference since it slows the amount of floating point errors (i.e. numerical unstability).
Therefore, the method of moments estimators for Gamma are:
Lemma 2: $\sum\limits_{i = 1}^n {X_i^2}  n{ {\bar X}^2} = \sum\limits_{i = 1}^n { { {\left( { {X_i}  \bar X} \right)}^2} }$
Proof:
Related distributions
If $X \sim Gamma\left({1, \frac{1}{\lambda } }\right)$, then $X$ has an exponential distribution with rate parameter $\lambda$.
Proof
First, note that the alternative parameterization of Gamma gives:
Then by setting $\alpha = 1$, we have:
If $X \sim Gamma\left({\frac{\nu}{2}, 2}\right)$, then $X$ is a chisquared distribution with $\nu$ degrees of freedom, $\chi ^2\left({\nu}\right)$.
Proof
Setting $\alpha = \frac{v}{2}$ gives:
If $X_i \sim Gamma\left({\alpha_i, \theta }\right)$, then $\sum\limits_{i = 1}^n { {X_i} } \mathop \sim \limits^{iid} Gamma\left({ \sum\limits_{i = 1}^n { {\alpha_i} } , \theta }\right)$.
Proof
Proof by Induction using convolution theorem
Basis: $X _1$ is a single gamma random variable. Therefore, $X _1$ has a gamma distribution of $Gamma\left({ \alpha_1, \theta }\right)$.
Induction: Suppose that $S_n = \sum\limits_{i = 1}^n { {X_i} }$ is of independent and identically distributed gamma random variables so that it has the distribution:
Let ${X_{n + 1} }$ a single gamma random variable independent from $S _n$ and identically distributed. Then, we expect the combined distribution of $S _n + X _{n+1}$ to be:
Proof
Proof by Induction using MGFs
If $X_i \sim Exponential\left({ {\lambda } }\right)$, then $\sum\limits_{i = 1}^n { {X_i} } \mathop \sim \limits^{iid} Gamma\left({n, \frac{1}{\lambda } }\right)$.
Corollary
Gamma in R
Command  Result 

dgamma(x, shape=alpha, scale=beta)  $f\left( x \right)$ 
pgamma(q, shape=alpha, scale=beta)  $P\left( {X \le x} \right)$ 
qgamma(p, shape=alpha, scale=beta)  ${\phi _p} \mathrel\backepsilon P\left( {X \le {\phi _p} } \right) = p$ 
rgamma(x, shape=alpha, scale=beta)  ${X_1}, \ldots ,{X_n}\sim Gamma\left( {\alpha ,\beta } \right)$ 
Misc
Misc 1: $0 = \sum\limits_{i = 1}^n {\left( { {X_i}  \bar X} \right)}$
Proof:
Proof: