A.2 An example of variational calculus

The problem to solve in addendum {A.22.1} provides a simple example of variational calculus.

The problem can be summarized as follows. Given is the following expression for the net energy of a system:

$\parbox{400pt}{\hspace{11pt}\hfill$\displaystyle
E = \frac{\epsilon_1}{2}\int...
...\vec r}
- \int \sigma_{\rm{p}} \varphi{\,\rm d}^3{\skew0\vec r}
$\hfill(1)}$
Here the operator $\nabla$ is defined as

\begin{displaymath}
\nabla \equiv {\hat\imath}\frac{\partial}{\partial x} +
...
...i}{\partial y} +
{\hat k}\frac{\partial\varphi}{\partial z}
\end{displaymath}

The integrals are over all space, or over some other given region. Further $\epsilon_1$ is assumed to be a given positive constant and $\sigma_{\rm {p}}$ $\vphantom0\raisebox{1.5pt}{$=$}$ $\sigma_{\rm {p}}({\skew0\vec r})$ is a given function of the position ${\skew0\vec r}$. The function $\varphi$ $\vphantom0\raisebox{1.5pt}{$=$}$ $\varphi({\skew0\vec r})$ will be called the potential and is not given. Obviously the energy depends on what this potential is. Mathematicians would say that $E$ is a “functional,” a number that depends on what a function is.

The energy $E$ will be minimal for some specific potential $\varphi_{\rm {min}}$. The objective is now to find an equation for this potential $\varphi_{\rm {min}}$ using variational calculus.

To do so, the basic idea is the following: imagine that you start at $\varphi_{\rm {min}}$ and then make an infinitesimally small change ${\rm d}\varphi$ to it. In that case there should be no change ${\rm d}{E}$ in energy. After all, if there was an negative change in $E$, then $E$ would decrease. That would contradict the given fact that $\varphi_{\rm {min}}$ produces the lowest energy of all. If there was an positive infinitesimal change in $E$, then a change in potential of opposite sign would give a negative change in $E$. Again that produces a contradiction to what is given.

The typical physicist would now work out the details as follows. The slightly perturbed potential is written as

\begin{displaymath}
\varphi({\skew0\vec r}) = \varphi_{\rm {min}}({\skew0\vec r}) + \delta\varphi({\skew0\vec r})
\end{displaymath}

Note that the ${\rm d}$ in ${\rm d}\varphi$ has been renotated as $\delta$. That is because everyone does so in variational calculus. The symbol does not make a difference, the idea remains the same. Note also that $\delta\varphi$ is a function of position; the change away from $\varphi_{\rm {min}}$ is normally different at different locations. You are in fact allowed to choose anything you like for the function $\delta\varphi$, as long as it is sufficiently small and it is zero at the limits of integration.

Now just take differentials like you typically do it in calculus or physics. If in calculus you had some expression like $f^2$, you would say ${\rm d}{f^2}$ $\vphantom0\raisebox{1.5pt}{$=$}$ $2f{\rm d}{f}$. (For example, if $f$ is a function of a variable $t$, then ${\rm d}{f^2}$$\raisebox{.5pt}{$/$}$${\rm d}{t}$ $\vphantom0\raisebox{1.5pt}{$=$}$ $2f{\rm d}{f}$$\raisebox{.5pt}{$/$}$${\rm d}{t}$. But physicists usually do not bother with the ${\rm d}{t}$; then they do not have to worry what exactly $f$ is a function of.) Similarly

\begin{displaymath}
\delta (\nabla\varphi)^2 = 2 (\nabla\varphi)\cdot \delta (\nabla\varphi)
\end{displaymath}

where

\begin{displaymath}
\delta \nabla\varphi =
\nabla(\varphi_{\rm {min}} + \delta\varphi) - \nabla(\varphi_{\rm {min}})
= \nabla\delta\varphi
\end{displaymath}

so

\begin{displaymath}
\delta (\nabla\varphi)^2 = 2 (\nabla\varphi)\cdot(\nabla\delta\varphi)
\end{displaymath}

For a change starting from $\varphi_{\rm {min}}$:

\begin{displaymath}
\delta (\nabla\varphi)^2 =
2 (\nabla\varphi_{\rm {min}})\cdot (\nabla\delta\varphi)
\end{displaymath}

(Note that $\varphi$ by itself gets approximated as $\varphi_{\rm {min}}$, but $\delta\varphi$ is the completely arbitrary change that can be anything.) Also,

\begin{displaymath}
\delta(\sigma_{\rm {p}}\varphi) = \sigma_{\rm {p}}\delta\varphi
\end{displaymath}

because $\sigma_{\rm {p}}$ is a given constant at every position.

Total you get for the change in energy that must be zero

$\parbox{400pt}{\hspace{11pt}\hfill$\displaystyle
0 = \delta E = \frac{\epsilo...
...}
- \int \sigma_{\rm{p}} \delta\varphi{\,\rm d}^3{\skew0\vec r}
$\hfill(2)}$

A conscientious mathematician would shudder at the above manipulations. And for good reason. Small changes are not good mathematical concepts. There is no such thing as small in mathematics. There are just limits where things go to zero. What a mathematician would do instead is write the change in potential as a some multiple $\lambda$ of a chosen function $\varphi_{\rm {c}}$. So the changed potential is written as

\begin{displaymath}
\varphi({\skew0\vec r}) = \varphi_{\rm {min}}({\skew0\vec r}) + \lambda \varphi_{\rm {c}}({\skew0\vec r})
\end{displaymath}

The chosen function $\varphi_{\rm {c}}$ can still be anything that you want that vanishes at the limits of integration. But it is not assumed to be small. So now no mathematical nonsense is written. The energy for this changed potential is

\begin{displaymath}
E = \frac{\epsilon_1}{2} \int
[\nabla(\varphi_{\rm {min}...
... {min}} + \lambda\varphi_{\rm {c}}) {\,\rm d}^3{\skew0\vec r}
\end{displaymath}

Now this energy is a function of the multiple $\lambda$. And that is a simple numerical variable. The energy must be smallest at $\lambda$ = 0, because $\varphi_{\rm {min}}$ gives the minimum energy. So the above function of $\lambda$ must have a minimum at $\lambda$ $\vphantom0\raisebox{1.5pt}{$=$}$ 0. That means that it must have a zero derivative at $\lambda$ $\vphantom0\raisebox{1.5pt}{$=$}$ 0. So just differentiate the expression with respect to $\lambda$. (You can differentiate as is, or simplify first and bring $\lambda$ outside the integrals.) Set this derivative to zero at $\lambda$ $\vphantom0\raisebox{1.5pt}{$=$}$ 0. That gives the same result (2) as derived by physicists, except that $\varphi_{\rm {c}}$ takes the place of $\delta\varphi$. The result is the same, but the derivation is nowhere fishy.

This derivation will return to the notations of physicists. The next step is to get rid of the derivatives on $\delta\varphi$. Note that

\begin{displaymath}
\int (\nabla\varphi_{\rm {min}})\cdot(\nabla\delta\varphi)...
...tial\delta\varphi}{\partial z}
{\,\rm d}x {\rm d}y {\rm d}z
\end{displaymath}

The way to get rid of the derivatives on $\delta\varphi$ is by integration by parts. Integration by parts pushes a derivative from one factor on another. Here you see the real reason why the changes in potential must vanish at the limits of integration. If they did not, integrations by parts would bring in contributions from the limits of integration. That would be a mess.

Integrations by parts of the three terms in the integral in the $x$, $y$, and $z$ directions respectively produce

\begin{displaymath}
\int (\nabla\varphi_{\rm {min}})\cdot(\nabla\delta\varphi)...
...}}{\partial z^2} \delta\varphi
{\,\rm d}x {\rm d}y {\rm d}z
\end{displaymath}

In vector notation, that becomes

\begin{displaymath}
\int (\nabla\varphi_{\rm {min}})\cdot(\nabla\delta\varphi)...
...a^2\varphi_{\rm {min}})\delta\varphi{\,\rm d}^3{\skew0\vec r}
\end{displaymath}

Substituting that in the change of energy (2) gives

\begin{displaymath}
0 = \delta E =
\int (-\epsilon_1\nabla^2\varphi_{\rm {min}}-\sigma_{\rm {p}})
\delta\varphi{\,\rm d}^3{\skew0\vec r}
\end{displaymath}

The final step is to say that this can only be true for whatever change $\delta\varphi$ you take if the parenthetical expression is zero. That gives the final looked-for equation for $\varphi_{\rm {min}}$:
$\parbox{400pt}{\hspace{11pt}\hfill$\displaystyle
-\epsilon_1\nabla^2\varphi_{\rm{min}}-\sigma_{\rm{p}} = 0
$\hfill(3)}$

To justify the above final step, call the parenthetical expression $f$ for short. Then the variational statement above is of the form

\begin{displaymath}
\int f \delta\varphi {\,\rm d}^3{\skew0\vec r}= 0
\end{displaymath}

where $\delta\varphi$ can be arbitrarily chosen as long as it is zero at the limits of integration. It is now to be shown that this implies that $f$ is everywhere zero inside the region of integration.

(Note here that whatever function $f$ is, it should not contain $\delta\varphi$. And there should not be any derivatives of $\delta\varphi$ anywhere at all. Otherwise the above statement is not valid.)

The best way to see that $f$ must be zero everywhere is first assume the opposite. Assume that $f$ is nonzero at some point P. In that case select a function $\delta\varphi$ that is zero everywhere except in a small vicinity of P, where it is positive. (Make sure the vicinity is small enough that $f$ does not change sign in it.) Then the integral above is nonzero; in particular, it will have the same sign as $f$ at P. But that is a contradiction, since the integral must be zero. So the function $f$ cannot be nonzero at a point P; it must be zero everywhere.

(There are more sophisticated ways to do this. You could take $\delta\varphi$ as a positive multiple of $f$ that fades away to zero away from point P. In that case the integral will be positive unless $f$ is everywhere zero. And sign changes in $f$ are no longer a problem.)