One of the reasons why I love my job as a researcher in experimental physics is that every day brings along a new problem to solve, and through decades of practice I have become quick at solving them, so I typically enjoy doing it. And it does not matter whether the problem at hand is an entirely new, challenging one or a textbook thing that has been solved a million times before. It is your problem, and it deserves your attention and dedication.
The other day I had a collaborative coding session with a post-doc (who also has a blog here, where he writes interesting stuff on machine learning and neural networks), Giles. He writes excellent python code, in a principled, tidy way that physicists usually are incapable to match, so I am learning a lot by working with him. At some point we found ourselves looking at the problem of finding the equation of a straight line in 3D which best fits a set of points. 
"Wait -" I can hear some of you say, "are you telling me you didn't just look it up and get the solution from f***ing googling it?" Yeah, that's exactly it. We knew this "problem" has tabulated solutions, but it is fun to do it yourself, and that's precisely what I did: as Giles was busy with python, I took paper and pencil (really!) and proceeded to solve it by maximizing a likelihood function. How weird is that? It reminds me a short story by (I think) Isaac Asimov telling the tale of a starship losing its main computer, and nobody is capable of doing additions and multiplications any more because they've used computers for ages to do that. There's only one guy who can do it, and to the disbelief of all the other passengers he finds the navigation parameters required to bring the vessel back home.
But let's see what the problem is, and if you can follow the very simple math, and solve it again with my guidance. You have a set of "detector hits" - readouts of the position a particle passed through - in 3D space. Call them x_i, y_i, z_i, with i=1...N. We assume that z_i are known perfectly (because the detector is a planar device, and well positioned at a certain z coordinate, while x_i and y_i come in with some uncertainties, which for now we assume all equal to s. We search for an equation linking x to z and y to z separately, because the x and y coordinates of each point are measured fully independently, and a 2D parametrization of a line is easier.
We start by writing these equations down: for a line that is not orthogonal to the z axis we may write in full generality
x(z) = x* + az
y(z) = y* + bz
Now, if the ones above are the equations that connect the value of x,y, and z of the particle trajectory, we may write the probability of finding the particle at a position x,y if its z position is z_i, and given a corresponding set of readouts x_i, y_i, and uncertainties s, as
P_i (x,y) = exp[-(x-x_i)^2 / (2s^2)] exp [-(y-y_i)^2 / (2s^2)]
Since we have a set of N readouts, we need to combine these probabilities, such that we can find the parameters x*, y*, a, b which we need to define our particle path. But this is easy: we are talking of independent measurements, so we just multiply probabilities:
P = P_1 P_2 P_3 .... P_N
Now, the likelihood of the parameters x*,y*,a,b can be expressed as that product P above, when we interpret P as a function of those four parameters - which we can do, if we have specified our data x_i,y_i,z_i. It is then even easier if we write directly the logarithm of the likelihood, because products become sums:
log L(x*,y*,a,b) = log P = log P_1 + log P_2 + log P_3 + ... + log P_N
Given the expression we have written above for the P_i's, we thus find
log L = -1/(2s^2) { SUM_i [(x-x_i)^2 + (y-y_i)^2] }
where SUM_i is the sum over the N sets of data points. We can now substitute the mathematical formulation of our model for the particle trajectory - a line - into the log L expression, by formulating x and y as connected to x*,a, y*,b and get
log L = -1/(2s^2) { SUM_i [(x*+az_i - x_i)^2 + (y*+bz_i - y_i)^2] }
"Fine, very good, let's put this into a minimizer routine and we're done", I can almost hear some of the computer geek readers here. No, we're going on with paper and pencil, as the fun is about to start! In fact, now what we just need to do is to ask that the likelihood be maximized by the four parameters, and the thing will solve itself! We thus write:
d log L(x*,y*,a,b) / dx* = 0
d log L(x*,y*,a,b) / dy* = 0
d log L(x*,y*,a,b) / da   = 0
d log L(x*,y*,a,b) / db   = 0
(where the symbols "d logL / dx", etc, mean taking the derivative of the logL function with respect to the variable x) and we'll obtain a set of four equations, from which we will be able to obtain the wanted values of the four parameters describing the particle trajectory. This is because when the likelihood is maximized its derivative with respect to each parameter will become zero - it has an extremum. Let's see what happens: e.g. the first equation becomes
-1/(2s)^2 SUM_i [ 2x* + 2az_i - 2x_i] = 0
If you consider that x* and a are not depending on the i indices over which the sum runs, we can simplify this equation (also getting rid of the constant term 1/(2s)^2) as
x* = SUM_i [x_i - a z_i] / N
which we can write, calling x_m the average of the x_i and z_m the average of the z_i,
x* = x_m - a z_m
Simple, ain't it? Similar calculations will easily bring you to get similar formulas for the other three parameters:
y* = y_m - b z_m
a = [(xz)_m -x* z_m ] / (z^2)_m
b ) [(yz)_m - y*z_m] / (z^2)_m
where again the suffix _m indicates that the expressions it applies to are averages over the N measurements; for xz, yz, z^2 this also applies, provided that you first multiply and then sum the components when you average them!
We have gotten to a four-equation system of four unknowns. They can be nasty in some cases, but this one is is actually very straightforward to solve. Take the first equation for x* and substitute in it the expression for a, and you immediately get
x* = [x_m - z_m (xz)_m / (z^2)_m ] / [1 - (z_m)^2 / (z^2)_m]
Similarly, because of the identical formulation of the problem on the two planes xz, yz, you get the corresponding expression for y* by just substituting y to x and b to a:
y* = [y_m - z_m (yz)_m / (z^2)_m] / [1 - (z_m)^2 / (z^2)_m]
For a and b the solution is instead
a = [(xz)_m - x* z_m] / (z^2)_m
b = [(yz)_m - y* z_m] / (z^2)_m
which we may keep in a compact form exploiting the already given expressions for x* and y* above.
So, this is it. As you see, the solution takes roughly five minutes with paper and pencil. Ok ok, let's make it ten minutes, so that we go slowly and avoid mistakes... But what happens if the s parameters are actually different for the N measurements, and different for the x_i and y_i ? Do we have to go back to the start and solve much harder equations? NO! The beauty of the Gaussian approximation of uncertainties, and the symmetry of the problem as written out above, is that we can substitute the arithmetic averages of the x_i, y_i, (x_i z_i), etc factors with weighted averages, where the weights are the inverse variances sx_i, sy_i, and we are done!
What do I mean with weighted averages? Well, that is the solution of another likelihood maximization problem, but one which you should know by heart: the weighted average x_w of measurements x_i each carrying an uncertainty sx_i is
x_w = [SUM_i x_i / sx_i^2] / SUM_i [ 1 / sx_i^2]
and similarly for all other factors.
Now let us stop for a second and think at what we have done. We have started with a set of points in the space, (x_i, y_i, z_i), and we have postulated that they were the result of detection hits of a particle zipping through on a straight line. Under that assumption, we can solve the problem with simple math... But we should never forget that a few other hidden assumptions are also taken for granted in solving the problem with a max likelihood as above. What we have coded implies that:
1) each x_i, y_i measurement carries uncertainties sx_i, sy_i which are completely independent on i, and between the two;
2) the uncertainties sx_i, sy_i are all that is needed to describe the probability that a measurement of a position x returns a value x_i. More precisely, we have assumed that the x_i have values which are Gaussian-distributed around the unknown true value x, with a sigma equal to sx_i. 
3) we have assumed, for simplicity, no uncertainty on the positions z_i. Although this is never true in practice, it is often a good approximation.
Assumption 2) in particular is a quite strong one, which in practice is never met exactly; on the other hand, a number of theorems in statistics reassure us that it is a pretty robust assumption in most cases, in the sense that a departure from that probability distribution will not harm our likelihood estimates much. Anyway, it is something to keep in mind!
So, I hope you enjoyed going through this very simple problem. You can learn a lot by solving similar problems with paper and pencil, but you need to keep your mind awake, and always know what are the assumptions you have taken in extracting a solution!
---
Tommaso Dorigo (see his personal web page here) is an experimental particle physicist who works for the INFN and the University of Padova, and collaborates with the CMS experiment at the CERN LHC. He coordinates the MODE Collaboration, a group of physicists and computer scientists from eight institutions in Europe and the US who aim to enable end-to-end optimization of detector design with differentiable programming. Dorigo is an editor of the journals Reviews in Physics and Physics Open. In 2016 Dorigo published the book "Anomaly! Collider Physics and the Quest for New Phenomena at Fermilab", an insider view of the sociology of big particle physics experiments. You can get a copy of the book on Amazon, or contact him to get a free pdf copy if you have limited financial means.
 

 
 
