An abstract algebraic approach to homogeneous linear differential equations with constant coefficients, Part I

Section 0: Introduction

Upon learning about the Weyl algebra and the notion of D-modules, the algebraically minded person may have the thought, “how much calculus and differential equation theory can I do with this?” This mild curiosity grew to a weird obsession over the years (despite not actually doing much mathematically with it) due to a compelling case of what it says about homogeneous linear ordinary differential equations with constant coefficients.

Much of my mathematical interest I would say lies at the intersection of algebra and analysis. Unfortunately, as in the case here, I find myself separating things out between the two and, in this case, showing that certain seemingly analytic results are wholly algebraic. But I think that there’s a significance to that idea to know when in analysis, we are engaging in analytic methods and when we are engaging in algebraic methods (or when we are using both in a fascinating way). Topology is left out here, not intentionally. Sorry topologists, maybe one day I’ll talk about your stuff.

Recall a standard argument used with homogeneous second order linear ordinary differential equations with constant coefficients. We assume that solutions of the form y = e^{ax} exist and proceeding with chain rule, obtain a polynomial called the characteristic polynomial (for linear algebraic reasons). Roots of the polynomial, provided that they are distinct, provide two linear independent solutions, whose linear span forms a two-dimensional subspace of the solution space, which in turn means by the uniqueness theorem that we have the whole solution space.

This is logically precise proof, and I even taught it with this method, but I have issues with it. The big one is the assumption of solutions (as smarter people let me know is called an ansatz) of a particular form bugs me. A proof that assumes less is in my opinion cleaner than a proof makes seemingly magical assumptions (see that one xkcd comic). Now people will say that this assumption is magical, but completely reasonable. Which maybe? But it does demand a non-rigorous “intuition” that personally I think should be avoided when possible. My main issue is this, the assumption of the solution is only an excuse to obtain the polynomial in a precise manner. If we think about things algebraically (the more used path here is the linear algebraic path for good reason), then we see that the polynomial is either the characteristic polynomial of a matrix or a polynomial of an operator (the approach advocated here).

Now let’s talk about an alternative. Somehow, I have the psychological issue of calling this a proof, even though my claim is that in the end it is a proof. By treating the differential equation as a polynomial in the differential operator, we can factor it. The factors have correspond to their own first-order differential equations, which are easily solvable. Then we can combine the solutions by taking linear combinations of the solutions. Use uniqueness as above and we’re done.

There are a few issues here too. One, are we allowed to treat differential equations as polynomials of an operator? (Yes, we can.) Two, are we allowed to factor? What’s multiplication here? (Also, yes. Multiplication is composition of operators which is why the second derivative is the “square” of the operator.) This seems like abstract nonsense? (Please, I promised myself not to get angry today.) This doesn’t seem super precise? (That’s not a question, but you’re right.) Why do we take sums of solutions when we’re looking at multiplicative factors? (Um… Do you know ring theory? No? … Magic?)

As I said before, I want to put that latter argument on firmer mathematical footing. But for that I will need a little bit of (supposedly elementary) ring theory. I should emphasize here that I’m not trying to say you need ring theory to do this. If you know ring theory fairly well, this theory becomes a nice application of algebraic ideas, showing that this rather easy result of differential equations is in the domain of algebra with the transcendental functions only showing up at the end.

Section 1: Definitions and Motivation

The first Weyl algebra is defined as the quotient of the free algebra (over the complex numbers) generated by two elements M and D by the ideal generated by DM - MD - 1. As a first encounter, this probably seems a bit strange. What does this have to do calculus? In this case, D represents the differentiation operator (we are currently only working in single-variable calculus) and M represents the multiplication operator by the function f(x) = x. The relation arises from the product rule and noticing that if acting on some suitable function space, then

DM(f(x)) = D(xf(x)) = xf'(x) + f(x) = (MD + 1)(f(x)).

(Some people will appropriately be angry at me for my notation conflating a function and its evaluation. Please forgive me, I did it for clarity and brevity.) The (first) Weyl algebra is an interesting algebra, introduced to me because it is an example of a simple ring which isn’t a matrix ring. But we will be unconcerned with these aspects (at least for now).

As noted in our motivation above, we are concerned about the actions of the Weyl algebra or in other words (left) modules over the Weyl algebra. Considering these ideals allows us to follow in the footsteps of D-module theory, but with far less substantial concerns. There are many natural candidates for a preferred module: the polynomials (maybe too small), the formal power series (maybe too big), and the set of infinitely differentiable functions with domain of real numbers (maybe too much analysis is involved). Our approach is to try to be as vague as possible about what module we are considering and make assumptions about what we need on the fly. Note that despite the fact that all our examples are rings, not every interesting example of a function space is a ring, and we will not make the assumption that our module is a ring. It is, I think, safe to say that most useful function spaces are vector spaces since scalar multiples of “nice” functions are also similarly “nice.” We may make the tacit assumption that the module under consideration is a vector space.

Now let a homogeneous linear ordinary differential equation with constant coefficients be given by:

y^{n} + a_{n-1} y^{n-1} + ... + a_1 y' + y = 0.

Let P(X) = X^n + a_{n-1}X^{n-1} + ... a_1 X + a_0. Then the differential equation can be written in the form: p(D)y = 0, or to use more module language, solutions of the differential equations are the p(D)torsion elements of the module. Because I can’t seem to find notation for what I want (please comment with a reference if you have one!), given an element r of a ring R and a left R-module N, I will refer to the subgroup (if R is commutative, then it is a submodule) of r-torsion elements by \mathrm{Tor}(r), or more set-theoretically:

\mathrm{Tor}(r) = \{ n \in N \colon r.n = 0 \}

I claim that understanding the structure of the subgroup \mathrm{Tor}(p(D)) will give us the result about solutions to the associated differential equation. But first we need some basic results about these torsion subgroups.

Section 2: r-Torsion Subgroups

Lemma 2.1. Let R be a ring, N a left R-module. Let r and s be elements of R such that r and s commute, i.e. rs = sr. Then

\mathrm{Tor}(r) + \mathrm{Tor}(s) \subseteq \mathrm{Tor}(rs).

The proof is left as an exercise to the reader. (It’s easy; use commutativity.)

Lemma 2.2. Let R be a ring, N a left R-module. Let r and s be elements of R such that the left ideal generated by r and s is R,  i.e. Rr + Rs = R. Then

\mathrm{Tor}(r) \cap \mathrm{Tor}(s) = \{0\}.

Consequently, the sum \mathrm{Tor}(r) + \mathrm{Tor}(s) is an internal direct sum.

Proof: Let a and b be elements of R such that ar + bs = 1. Let n \in \mathrm{Tor}(r) \cap \mathrm{Tor}(s). So r.n = s.n = 0. So n = (ar+bs).n = a(r.n) + b(s.n) = 0. Or n = 0. Therefore, \mathrm{Tor}(r) \cap \mathrm{Tor}(s) = \{0\}.

The condition on the following lemma is a bit awkward.

Lemma 2.3. Let R be a ring. Let r and s be elements of R such r and s commute. If there exist elements a and b of R that commute with both r and b such that ar + bs = 1, then

\mathrm{Tor}(rs) = \mathrm{Tor}(r) \oplus \mathrm{Tor}(s).

Proof: From the previous lemmata,

\mathrm{Tor}(r) + \mathrm{Tor}(s) \subseteq \mathrm{Tor}(rs),

and the sum in the right-hand side is an internal direct sum. It suffices to show

\mathrm{Tor}(rs) \subseteq \mathrm{Tor}(r) \oplus \mathrm{Tor}(s).

Let a and b be elements of R such that ar + bs = 1. Let n \in \mathrm{Tor}(rs). Let n_1 = (bs).n and n_2 = (ar).n. Notice that

(bs).n + (ar).n = (ar + bs).n = 1.n = n.

And since r.n_1 = r(bs).n = b(rs).n = b.0 = 0 and s.n_2 = s(ar.n) = a(rs).n = a.0 by commutativity, n_1 \in \mathrm{Tor}(r) and n_2 \in \mathrm{Tor}(s).

The previous lemma isn’t quite pretty, but there are some things to point out in its defense. One, the condition is not as unreasonable in light of the previous lemma using a similar formula ar + bs = 1. The commutativity conditions will not play a role in our particular case, since polynomials in D form a commutative subring in which we are operating. The proof includes a moment where a formula is seemingly pulled out of a hat, but in this case, the main thrust of the decomposition comes from writing 1 as a combination of sorts. From there the decomposition is clearer as the part the will be annihilated by r is the factor missing r with a similar situation for s.

Barring this one hiccup, the lemmata we proved hopefully seems like quite reasonable general algebraic results that might be seen in an undergraduate algebra course. With these tools, we return our attention to the specific torsion subgroup related to our differential equation.

Section 3: The Structure of the Subgroup Associated to a Differential Equation

We saw in Section 1 that the solution space of a homogeneous linear ordinary differential equation with constant coefficients is a subgroup Tor(p(D)), where p(X) is the polynomial given by the differential equation.

From Lemma 2.3, we have the following result:

Lemma 3.1. Let q(X) and r(X) be two coprime polynomials. Then

\mathrm{Tor}(q(D)r(D)) = \mathrm{Tor}(q(D)) \oplus \mathrm{Tor}(r(D)).

Consequently, if p(X) = (X - \lambda_1)^{m_1}(X - \lambda_2)^{m_2} ... (X - \lambda_k)^{m_k}, then

\mathrm{Tor}(p(D)) = \mathrm{Tor}((D-\lambda_1)^{m_1}) \oplus \mathrm{Tor}((D-\lambda_2)^{m_2}) \oplus ... \oplus \mathrm{Tor}((D - \lambda_k)^{m_k}).

To simplify matters we can work in the subalgebra generated by D, consisting of polynomials in D. This makes the commutativity conditions vacuous, since we are in a commutative algebra. Since polynomials with complex coefficients form a Euclidean domain, when q(X) and r(X) are coprime, there exist polynomials a(X) and b(X) such that a(X)q(X) + b(X)r(X) = 1. So the ideal generated by q(D) and r(D) is the whole subalgebra.

When P(X) has no multiple roots, we see that

\mathrm{Tor}(p(D)) = \mathrm{Tor}(D-\lambda_1) \oplus \mathrm{Tor}(D-\lambda_2) \oplus ... \oplus \mathrm{Tor}(D - \lambda_k).

So the solution space of the differential equation is given by the direct sum of solutions to first-order differential equations. Taking a look at the summands and returning to analysis, we see that \mathrm{Tor}(D-\lambda_i) is the solution space of the differential equation y' = \lambda_i y, which consist of scalar multiples of the exponential e^{\lambda_i x}. So we conclude with a proof of the fact stated at the top: the general solution of a homogeneous linear ordinary differential equation with constant coefficients is the linear combinations of exponentials whose parameters are given by roots of a polynomial.

In my opinion, this proof and its main ideas make the result clear. One side of the differential equation literally is a polynomial of the differential operator, and the solutions to its factors are summands of the general solution space.

The algebraist may find one (more) complaint, which is this last bit of argument relied on us to know about the exponential function. The algebraist may hope that we can axiomatize somehow the exponential function as a generator of \mathrm{Tor}(D-1). The main issue seems to be how we can make connections between the single exponential (i.e. e^x) and all the other closely related exponential functions e^{\lambda x}. Without a notion of (pre-)composition, I don’t know how this can be done. The alternative, defining infinitely many, a priori unrelated exponentials seems bad. Any thoughts on this matter would be appreciated.

Finally, both in our discussion of the differential equation theory method and the module perspective, we did not address the case with higher multiplicity. I would like to return to that subject, but it requires some more tools and I haven’t quite fleshed out the motivation. I will leave this to a Part II, if I can muster up the strength to write it.

One Response to An abstract algebraic approach to homogeneous linear differential equations with constant coefficients, Part I

  1. Pingback: An abstract algebraic approach to homogeneous linear differential equations with constant coefficients, Part I - Nevin Manimala's Blog

Leave a comment