Some connections between things, which I have not seen elsewhere. Maybe they mean something?
1. The Baseless Logarithm
Normally one writes a logarithm with a base, \(\log_b (x)\), to mean
\[y = \log_b (x) \Lra b^y = x\]
And then you can change the base of the logarithm with
\[\log_b (x) = \frac{\log_a (x)}{\log_a(b)}\]
Which follows from rearranging \(\log_a (x) = \log_a (b^{\log_b x}) = \log_b (x) \times \log_a (b)\).
One way of thinking about what this formula does is that it is a change of units, akin to writing \(2 \text{ km} = 2000 \text{ m} / \frac{1000 \text{ m}}{1 \text{ km}}\) or \(5 \text{ bytes} = 40 \text{ bits}/\frac{8 \text{ bits}}{1\text{ byte}}\). It says: how many copies of \(b\) are in \(x\)? It’s the number of copies of \(a\) in \(x\), divided by the number of copies of \(a\) that are in \(b\).
This is perfectly simple, but for some reason it’s hard to think about logarithms that way. The notation kind of… obfuscates things? Specifically it is hard to read \(\log_b x\) as “how many copies of \(b\) are in \(x\)”, because that English expression should correspond to the notation \(x/b\), not \(\log_b x\). “How many factors of \(b\) are in \(x\)” is a bit better, but it still feels off.
I found a way of thinking about logarithms which I think makes this clearer, but you have to allow a sort of odd object that I am call the baseless logarithm. It is simply a logarithm without a base:
\[\log N\]
which we regard as an abstract object, not a number. Then we write our normal “based” logarithm as a ratio of two of these baseless logarithms:
\[\log_2 N = \frac{\log N}{\log 2}\]
Note, this is already a thing people do colloquially, e.g. leaving out the base of logarithms in asymptotic formulas. But I do not mean it as a shorthand; it is more useful to regard it as an actual algebraic object.
We interpret \(\log 2\) as being the unit “bits”. To write \(\log N\) in bits is to factor it as a multiple of \(\log 2\):
\[\log N = \frac{\log N}{\log 2} \log 2 = \log_2 (N) \log 2 = \log_2 (N) \text{ bits}\]
Then the change-of-base for logarithms follows from just writing the same geometric quantity in different units. For example \(\log e\) as a unit is sometimes called “nats”:
\[\begin{aligned} \log N = \frac{\log N}{\log 2} \log 2 = \log_2 (N) \text{ bits} = \frac{\log N}{\log e} \log e = \ln (N) \text{ nats} \end{aligned}\]
The baseless \(\log N\) is sort of the multiplicative version of an object that might be familiar from discussions of vectors. It is common with vectors to distinguish between points and displacements: a displacement vector \(\b{v}\) is given by the difference of two points \(\v = (b) - (a)\). When we write think of points as having coordinates, this involves an explicit choice of origin \(\O\), such that \(\b{a} \equiv (a) - \O\) and \(\b{b} \equiv (b) - \O\). Then a displacement vector is constructed by subtracting off the factors of \(\O\), \(\b{v} = \b{b} - \b{a} = ((b) - \O) - ((a) - \O) = (b) - (a)\). The baseless logarithm implements the same thing but with multiplication: the value \(\log N\) may be thought of as \(\log N / \log \O\) for an unspecified choice of origin; turning it into an actual numeric value involves dividing two such logarithms to cancel out the origin, \(\log_M N = \log N / \log M = (\log N / \log \O) / (\log M / \log O)\). I think of \(\log N\) as the point corresponding to \(N\) and \(\log N / \log \O\) as its corresponding displacement vector once you pick a coordinate system. The point version is more fundamental.
You might ask: if we have a baseless logarithm \(\log N\), do we also have a “baseless exponential”? Normally \(b^{\log_b N}\) can be written as something like \(b^{\log_b N} = b^{\ln N / \ln b} = e^{\ln N} = N\); is there any way to do this without actually choosing a base, like \((\ast)^{\log N}\) or something? I think the answer has to be “no”, because I can’t think of a way to make it mean anything. All we can say is that we have split the one object, a logarithm \(\log_b N\) which is the solution of \(b^y = N\), into two objects, \(\log N\) and \(\log b\), each of which on their own are without “units” and so have no numerical meaning.
So logarithms act kinda like multiplicative vectors, in the sense that they have have to defined relative to an ‘origin’, a choice of base. In fact there are many surprising similarities between logarithms and vectors, which I had fun expositing about:
2. Logarithms are Vectors
When doing vector algebra and differential geometry in a properly covariant way, we distinguish between abstract vectors and vectors in a particular coordinate system.
My personal convention for this is to refer to the abstract vectors as “geometric” vectors and always write them in bold, \(\v\), whereas “coordinate” vectors, tuples of their values in coordinates, are written with an arrow over them like \(\vec{v} = (v_x, v_y, v_z)\). Boldface geometric vectors are always coordinate-free, whereas coordinate vectors are just collections of numbers or other objects. The geometric vector \(\b{v}\) can be written as a dot product of its coordinates with a ‘frame’ \(X = (\x, \y, \z)\) of basis vectors
\[\b{v} = \vec{v} \cdot X = (v_x, v_y, v_z) \cdot (\x, \y, \z) = v_x \x + v_y \y + v_z \z\]
The projection of \(\v\) onto a basis vector \(\x\) is then given by ‘measuring’ the vector against the basis vector (which does not have to be of unit length). I like to write this as division because it acts a lot like division (although it’s technically pseudodivision instead):
\[\frac{\v}{\x} = v_x\]
That’s in my own very nonstandard notation1 for vector division here. The more common way to write this is to project a component of a differential \(df = f_x dx + f_y dy + f_z dz\) with a partial derivative, which is also the pseudodivision operation (which is incidentally the sense in which partial derivatives kinda work like division but not really):
\[\frac{\p f}{\p x} = f_x\]
I will write things in both forms to make it easy to translate between them; I do prefer my vector-division version because it avoids bringing in the irrelevant notations of differential calculus, but since the latter is actually standard I ought to include it for comparison.
Suppose \(\b{v}\) is one-dimensional, \(\b{v} = v_x \x\). Then the projection onto a ‘measuring stick’ \(\b{m} = m \x\) measures its length in terms of multiples of \(m\):
\[\frac{\v}{\b{m}} = \frac{v_x \x}{m \x} = \frac{v_x}{m}\]
Multiplying by \(\b{m}\) again is what we mean by “writing \(\b{v}\) in units of \(\b{m}\)”:
\[\frac{\b{v}}{\b{m}} \b{m} = (\frac{v_x}{m}) (m \x)\]
Here \(m\) is the unit “meters” and \(v_x/m\) is the value of \(v_x\) written in meters. Of course to actually compute \(v_x/m\) you have to have it in units in the first place—but clearly it’s the same kind of thing as in the logarithm case, where you can think of \(\b{v}\) and \(\b{m}\) as “unitless” concepts that are compared geometrically, and then \(v_x/m\) as their projections into an aribtrary coordinate system.2
The baseless logarithm is performing the same operation on logarithms, where \(\log N\) is filling the role of the geometric vector \(\v\) and \(\log 2 = \text{bits}\) is the unit vector or measuring stick, which takes the role of \(\x\).
\[\begin{aligned} \frac{\log N}{\log 2} &= \log_2 N \\ \frac{\log N}{\log 2} \log 2 &= \log_2 N \text{ bits} \end{aligned}\]
In this sense baseless logarithms write numbers in coordinates in exactly the same way that measuring sticks write vectors in coordinates.
The equivalence of logarithms in different units
\[\begin{aligned} \log N &= \frac{\log N}{\log 2} \log 2 = \log_2 (N) \text{ bits} \\ &= \frac{\log N}{\log e} \log e = \ln (N) \text{ nats} \end{aligned}\]
is the same as the equivalence of geometric vectors in different units
\[\begin{aligned} \v &= \frac{\v}{\x} \x = v_x \x \\[1em] &= \frac{\v}{\x’} \x’ = v_{\x’} \x’ \\ \end{aligned}\]
or
\[\begin{aligned} df &= \frac{\p f}{\p x} dx = f_x dx \\ &= \frac{\p f}{\p x’} dx’ = f_{x’} dx’ \end{aligned}\]
And the change of base formula that computes a ratio of logarithms in different bases
\[\begin{aligned} \log_2 N \text{ bits}&= \ln N \text{ nats} \\ \log_2 N &= \frac{\text{nats}}{\text{bits}} \ln N\\ &= \frac{\log e}{\log 2} \ln N \\ &= \log_2 (e) \ln N \end{aligned}\]
is exactly like the change of coordinates for a vector, where \(\x\) and \(\x\) are two units for the same quantity.
\[\begin{aligned} v_x \x &= v_{x’} \x’ \\ v_x &= \frac{\x’}{\x} v_{\x’} \\ \end{aligned}\]
or3
\[\begin{aligned} f_x dx &= f_{x’} dx’ \\ f_x &= \frac{dx’}{dx} f_{x’} \end{aligned}\]
What logarithms don’t allow that vector division and differential notations easily do is to talk about a partial projection operation or a partial derivative in isolation. For example, if \(N = 2^a 3^b\), you can only talk about the “total” logarithm, the ratio with respect to a single unit \(\log 2\)
\[\frac{\log N}{\log 2} = a \frac{\log 2}{\log 2} + b \frac{\log 3}{\log 2} = a + b \log_2 3\]
which is equivalent to writing a vector as a multiple of a single basis vector (like in Clifford/geometric algebra)
\[\frac{\v}{\x} = v_x + v_y \frac{\y}{\x}\]
or to a total derivative
\[\frac{df}{dx} = f_x + f_y \frac{dy}{dx}\]
But there is no equivalent of the operation of partial differentiation, a “partial logarithm”, which would let you factor a number like
\[N \? (\log_{\p 2} N) \log 2 + (\log_{\p 3} N) \log 3\]
However, I keep finding that people have gone and invented the projection / partial derivative operation on logarithms anyway. For example, the p-adic valuation in number theory
\[\nu_p (n) = \max \{ k \in \bb{N} \mid p^k \mid n \}\]
corresponds to extracting the coefficient of \(\log p\) of an natural number in a logarithmic basis
\[\begin{aligned} \log n &= \log 2^{n_2} 3^{n_3} 5^{n_5} \cdots \\ &= n_2 \log 2 + n_3 \log 3 + n_5 \log 5 + \ldots \\ \nu_p (n) &= n_p \end{aligned}\]
Each coefficient is a positive integer, and \(\nu_p\) just takes the component corresponding to \(\log p\). Clearly \(\log n\) acts like a vector (although since the coefficients are in \(\bb{N}\) it is technically a commutative monoid instead of a vector space… nevertheless, it has the familiar structure of a vector). Since \(\nu_p\) is a ‘projection’ out of this logarithm, it still obeys logarithmic identities like \(\nu_p(m/n) = \nu_p(m) - \nu_p(n)\). But there is not really a good notation for actually expressing it as a projection, so sadly it gets a whole separate nomenclature that you have to learn.4
The same thing also works for rational \(n\) or radical \(n\) (meaning it is the product of radicals of prime factors), in which case the coefficients become integers or rationals. (As a bonus the resulting objects live in an actual vector space.)
Another example of these logarithmic projections: in complex analysis the “order of vanishing” \(\text{ord}_a f(z)\) of a meromorphic function \(f(z)\) at a point \(z=a\) is the order of the pole or zero at a point (where zeroes are like negative poles). That is, it is the degree \(n\) of the lowest-degree term in the Laurent series of the function around the point \(z=a\),
\[f(z) = f_{-n} (z-a)^{-n} + f_{-n+1} (z-a)^{-n+1} + \cdots + f_{-1} (z-a)^{-1} + f_0 + f_1 (z-a) + \cdots\]
(that is, the value of \(n\) such that \((z-a)^n f(z)\) is holomorphic around \(a\)). This is extracted with a logarithm:
\[\text{ord}_a f(z) = \lim_{z \ra a} \frac{\log f(z)}{\log (z-a)} = -n\]
since for \(z \approx a\), \(f(z) \sim f_{-n} (z-a)^{-n}\) which dominates the other terms that blow up less quickly. If we write \(g(z)\) for the rest of \(f(z)\) which has \(\text{ord}_a (g(z)) > -n\):
\[\begin{aligned} \lim_{z \ra a} \frac{\log f(z)}{\log (z-a)} &= \lim_{z \ra a} \frac{\log (f_{-n} (z-a)^{-n} + g(z))}{\log (z-a)}\\ &= \lim_{z \ra a} \frac{\log f_{-n} (z-a)^{-n} (1 + \frac{g(z)}{f_{-n}} (z-a)^n)}{\log (z-a)} \\ &= \lim_{z \ra a} \frac{\log f_{-n}}{\log (z-a)} -n \frac{\log (z-a)}{\log (z-a)} + \frac{\log (1 + c (z-a))}{\log (z-a)} \\ &= -n \end{aligned}\]
So this is a very similar operation: the limit \(\lim_{z \ra a} \log (z-b)/\log(z-a) = 1_{a=b}\) serves to cancel out the rest of the terms, like how \(\p_j dx^i \sim (\p x^i)/(\p x^j) = 1_{i=j}\) serves to cancel out the terms in a partial derivative, extracting the \(dx\) component of \(df = f_x dx + f_y dy + \ldots\).
(I’m not very good at complex analysis so that’s all I’m going to say about that. Still, it seems clear that this is basically the same operation.)
We see that the baseless logarithm \(\log n\) works a lot like a vector \(\v\) or differential \(df\), and then expressing a logarithm in a base like \(\log_2 n = \log n / \log 2\) is a lot like a total derivative \(df/dx\) or Clifford division \(\v \ast \b{x}^{-1}\). What is missing is some equivalent of the partial derivative / projection operator that projects only onto that component… but various fields have gone and Found a way to invent that anyway, either in the form of a partial derivative \(\p f/\p x\), or just by making up the \(p\)-adic valuation \(\nu_p\), or by the limits \(\lim_{z\ra a} \log f(z) / \log (z-a)\) in complex analysis. The similiarities are all suspicious, though, and I can’t help but think there is some unifying theory here that ties all this together… but I can’t see what it is yet.
One thing that we might try in order to invent a \(\log_2 N\) that acts like \(\p_x f\) or \(\b{v}/\x\) is to somehow restrict the values of the logarithms to certain spaces, e.g. integers or rationals. Since the \(\{\log p_i\}\) are linearly indepedent (which is essentially equivalent to prime factorizations being unique), you would end up with objects like \(\log_2 3 = \log_3/\log_2\) which have no value in \(\bb{Q}\); “zeroing” those out then gives something that acts like a partial derivative. But I don’t know if that’s useful. Certainly it doesn’t help in any numeric context.
Anyway, onto more things that are logarithms.
3. Vectors are also Logarithms?
In differential geometry one interprets vectors like \(\v = v_x \x + v_y \y\) being written in a basis of partial derivative operators, \(\v = v_x \p_x + v_y \p_y\). These can then be used to create discrete translations which move around in the various coordinates,
\[T^{\v} = e^{\v} = e^{v_x \p_x + v_y \p_y }\]
The partial derivatives are here in order to make it operate on functions
\[e^{v_x \p_x + v_y \p_y} f(x,y) = f(x + v_x, y + v_y)\]
which is true at the level Taylor expansions as well. I often find it easier to dispense with the partial derivatives and just think of these as translation operators on the space \((x,y)\) directly
\[e^{v_x \p_x + v_y \p_y} (x, y) = (x + v_x, y + v_y)\]
(You can think of this acting on the function \(f(x,y) = (x,y)\) also, but that feels like overkill.)
In any case, all this is really doing (in flat space, at least) is rewriting the additive vector \(\b{v}\) into a multiplicative form \(T^{\b{v}}\) which corresponds to the same operation. Things are just being written differently: its terms are multiplied instead of added, and scalar coefficients are applied via exponentiation instead of multiplication. A basis for the vector space now consists of translation operators in each coordinate:5
\[T^{\v} = e^{v_x \p_x} e^{v_y \p_y} = T_x^{v_x} T_y^{v_y}\]
(In non-flat space this is not so simple because the translations in different coordinates may not commute; you can still write it in this form but it’s a lot more complicated.)
What this means for us is: look, vectors are logarithms too!
\[\begin{aligned} \ln T^{\v} &= \ln T_x^{v_x} T_y^{v_y} \\ &= v_x \ln T_x + v_y \ln T_y \\ &= v_x \p_x + v_y \p_y \end{aligned}\]
I can’t exactly say why, but it seems preferable to have this written in terms of baseless logarithms also. We do this by realizing that \(T_x = e^{\p_x} = T^{\p_x}\) and thinking of this symbol \(T\) as a sort of ‘generic’ base for translations, absent the numeric meaning of the symbol \(e\), which has \(\log T_x = \log T^{\p_x} = \p_x \log T\). Then
\[\log T^{\v} = \v \log T = v_x \p_x \log T + v_y \p_y \log T\]
And then we can write \(\v = \log_T T^{\v} = \log T^{\v} / \log T\). This is equivalent to the natural log version but it avoids explicitly depending on the numeric value of \(e\): any choice of base for the logarithm \(T\) gives the same concept of a vector, written in terms of the exponentiation of \(T\), but now we make explicit that the ‘units’ on \(\v\) come in part from the units on \(\log T\) itself.
So vectors in differential geometry may also be thought of as logarithms, specifically, the logarithms of translation operators.
Regular multiplication can even be viewed as an example of this. A product like \(xa\) can be rewritten as “translation” in the \(\ln a\) coordinate:
\[xa = e^{\ln x} e^{\ln a} = e^{(\ln x) \p_{\, \ln a}} a = x^{\p_{\, \ln a}} a\]
I mention this because it’s cute, but I can’t imagine how it would ever be useful.
4. Logarithms are Derivatives?
This part doesn’t really connect to the rest; I just thought I would mention it so that this article contains every fun fact about logarithms that I know.
One way of defining the natural logarithm is
\[\ln x = \lim_{a \ra 0} \frac{x^a - 1}{a}\]
Which can be found by rewriting \(x^a = e^{a \ln x}\) and then Taylor expanding: