LU decomposition

LU decomposition#

Commonly we will have to repeatedly solve \(A \vec{x} = \vec{b}\) for multiple \(\vec{b}'s\). Gauss Elimination for each \(\vec{b}\) would be grossly inefficient. If you knew all the \(\vec{b}\) in advance you could do this in parallel by forming the augmented matrix:

\([A|b_1 \ b_2 \ b_3 \ ...]\)

but this is seldom the case.

It is much more efficient to decompose the matrix \(A\) into a form that is easier to solve.

There are other reasons to do this for special matrix types and distributed computing which we will discuss later.

We have actually already seen this efficiency boost with back-substitution. The equation \(U x = b\) solves in \(O(n^2)\).

Any square matrix can be decomposed,

\(A = LU\)

where:

\(L\) is a lower triangular matrix

\(U\) is an upper triangular matrix

Now, the linear system becomes:

\[\begin{split} \begin{align} A\vec{x} &= \vec{b} \\ LU\vec{x} &= \vec{b} \end{align} \end{split}\]

Now let \(y = Ux\), such that

\[\begin{split}\begin{align} L\vec{y} &= \vec{b} \\ U\vec{x} &= \vec{y} \end{align} \end{split}\]

both of which solve in \(O(n^2)\).

NOTE: L and U are generally not unique.

Example: Return to the previouis example:

(3)#\[\begin{align} 4x_1 + 3x_2 - 5x_3 &=& 2 \\ -2x_1 - 4x_2 + 5x_3 &=& 5 \\ 8x_1 + 8x_2 &=& -3 \\ \end{align}\]

Through Gaussian Elimination, we found

\[\begin{split} U= \begin{bmatrix} 4 & 3 & -5 \\ 0 & -2.5 & 2.5 \\ 0 & 0 & 12 \\ \end{bmatrix}\end{split}\]

by clearing the first column by multiplying the first row by \(-0.5\) for the second row, and \(2\) for the third. The second column was cleared with the second row multiplied by \(-0.8\). These coefficients turn out to be the elemements of the \(L\) matrix (with 1’s along the diagonal)!

\[\begin{split} L= \begin{bmatrix} 1 & 0 & 0 \\ -0.5 & 1 & 0 \\ 2 & -0.8 & 1 \\ \end{bmatrix}\end{split}\]

Let’s verify:

# prompt: Do decomposition on the above matrix

import numpy as np

# Define the matrix A
A = np.array([[4, 3, -5],
              [-2, -4, 5],
              [8, 8, 0]])

print("The original matrix A is:\n", A, "\n")
L = np.array([[1,0,0],
               [-.5, 1,0],
               [2,-.8,1]])

U = np.array([[4,3,-5],
              [0,-2.5,2.5],
              [0,0,12]])

print("The reconstructed matrix is:\n", L@U)

The original matrix A is:
 [[ 4  3 -5]
 [-2 -4  5]
 [ 8  8  0]] 

The reconstructed matrix is:
 [[ 4.  3. -5.]
 [-2. -4.  5.]
 [ 8.  8.  0.]]

Let’s check the package decomposition!

# Calculate the LU decomposition
from scipy.linalg import lu, inv
P, L, U = lu(A)

print("Permutation Matrix (P):\n", P)
print("Lower Triangular Matrix (L):\n", L)
print("Upper Triangular Matrix (U):\n", U)



print("\nMultiply L and U:\n", L@U, "\nwhich is correct but pivoted!")

print("\nMultiply PLU:\n", P@L@U, "\nwhich is the original matrix!")

Permutation Matrix (P):
 [[0. 0. 1.]
 [0. 1. 0.]
 [1. 0. 0.]]
Lower Triangular Matrix (L):
 [[ 1.    0.    0.  ]
 [-0.25  1.    0.  ]
 [ 0.5   0.5   1.  ]]
Upper Triangular Matrix (U):
 [[ 8.   8.   0. ]
 [ 0.  -2.   5. ]
 [ 0.   0.  -7.5]]

Multiply L and U:
 [[ 8.  8.  0.]
 [-2. -4.  5.]
 [ 4.  3. -5.]] 
which is correct but pivoted!

Multiply PLU:
 [[ 4.  3. -5.]
 [-2. -4.  5.]
 [ 8.  8.  0.]] 
which is the original matrix!

NB: \(P\) in the above is the permutation matrix that, when multiplied by LU recovers the original matrix. It is not the pivoting operation that is done internally (although that matrix is easily obtained!).

Computational complexity#

Although forward and back substition both occur in \(O(n^2)\), the decompsition itself is still \(O(n^3)\). However, there are a number of reasons why this is the preferred approach:

Decompsition only needs to occur once, and the system can be solved for multiple \(\vec{b}\) efficiently.
There are parallelized algorithms for distributed computing which can speed up the walltime.
\(L\) and \(U\) are useful in other

Dr. Mike’s Tips!#

Direct solver are your ‘black box’ for most of your needs.
They are the most robust for ill-conditioned systems.
They scale terribly (both in system size and parallelization)
If you use them, start with a small system and work upwards.
Generally speaking you won’t see a speedup with parallelization until you get a large # of nodes
Warning: Some implementations (numpy) are sophisticated enough to handle singular matricies as well as non-singular (be careful with the answer!)
Sparse matricies are your saving grace! Do your best to protect them (hence store the LU factors, not the inverse!)

LU decomposition

Contents

LU decomposition#

Computational complexity#

Dr. Mike’s Tips!#