mathematicstheory

GNFA State Elimination: The Algorithm That Made Automata Theory Click

Dec 10, 20246 min readUtso Sarkar

I joined IIT Roorkee in August 2024. By the end of my first semester, automata theory had a reputation: either you loved the elegance or you failed the midterm and questioned your life choices. I was in the second camp until GNFA state elimination clicked. Not clicked like “I got a good grade.” Clicked like “I can derive the regex for this NFA on a whiteboard without praying to the textbook gods.”

The course covered NFAs, DFAs, regular expressions, pumping lemmas, and the guilt-inducing feeling that everyone else seemed to see the isomorphism while you memorized constructions. State elimination on Generalized NFAs (GNFAs) was the bridge that made regex and automata feel like the same object wearing different hats.

The Problem: NFA to Regular Expression

Given an NFA, produce an equivalent regular expression. The textbook Thompson construction goes regex to NFA. Going backward feels harder because NFAs have parallel paths, epsilon transitions, and the combinatorial explosion of naive state subset methods.

State elimination gives a direct, if tedious, path:

Convert NFA to a GNFA with structured start and end states
Repeatedly eliminate internal states
Read the regex label on the single edge from start to accept

The magic is not magic. It is accounting. Every elimination updates edge labels with regex algebra that captures all paths that routed through the removed state.

What Is a GNFA?

A Generalized NFA allows each transition to be labeled with a regular expression, not just a single symbol. We assume:

One start state with no incoming edges
One accept state with no outgoing edges
Every other state has edges to every other state (use empty set label if missing)

That last constraint sounds ugly. It makes elimination uniform. You never special-case missing transitions because “missing” is the empty regex.

The Elimination Rule

To eliminate state k, for every pair of states (i, j) update the label on edge (i, j):

R_ij = R_ij + R_ik · (R_kk)* · R_kj

Where · is concatenation, + is union, and * is Kleene star. Read it as: old paths from i to j, plus paths that go from i to k, loop zero or more times at k, then go from k to j.

If you have ever written a Floyd-Warshall loop and felt smug, this is the regex version with more symbols and fewer integers.

Before and After One Elimination Step

flowchart LR
    subgraph before [Before Eliminating State k]
        direction LR
        S((s)) -->|A| I((i))
        I -->|B| K((k))
        K -->|C| J((j))
        I -->|D| J
        K -->|E| K
        J -->|F| T((t))
        I -->|G| T
    end

    subgraph after [After Eliminating k]
        direction LR
        S2((s)) -->|A| I2((i))
        I2 -->|D + B·E*·C| J2((j))
        J2 -->|F| T2((t))
        I2 -->|G + B·E*·F| T2
    end

State k vanishes. Its looping behavior compresses into (E)*. Paths that used k as a relay fold into updated labels on (i, j) and direct shortcuts to accept.

Repeat until only s and t remain. The label on s → t is your regex. Simplify with algebra if you enjoy pain.

Elimination Order Matters for Human Sanity

The algorithm works for any elimination order of internal states. Different orders produce different regexes, all equivalent but some look like a cat walked on your keyboard.

Heuristics that saved me on exams:

Eliminate states with few incident edges first when possible
Eliminate sink-like states that only loop or exit early
Leave high fan-in/fan-out hubs for later when labels are already compound

For course assignments, brute elimination order is fine if your simplification is disciplined. For humans, order is the difference between (a+b)* and a seventeen-parenthesis monster.

Worked Intuition: Epsilon Transitions

Real NFAs have epsilons. Convert them away first or fold them into GNFA labels during preprocessing. Epsilon closure on transitions becomes union of labels. I wasted a problem set trying to eliminate epsilons mid-algorithm. Do not be me.

Standard preprocessing:

Add fresh start and accept with epsilons as needed
Remove epsilon transitions by label composition
Ensure GNFA completeness with empty labels on missing edges

Then eliminate. The preprocessing is boring. The elimination is mechanical. The algebra simplification is where grades go to die.

Implementation Sketch (What I Actually Coded)

For a CP contest helper script, I represented edge labels as strings and trusted a simplification pass… lightly. Production of a correct regex for small NFAs:

def eliminate(gnfa, k):
    for i in gnfa.states:
        if i == k:
            continue
        for j in gnfa.states:
            if j == k:
                continue
            rik = gnfa.edge(i, k)
            rkk = gnfa.edge(k, k)
            rkj = gnfa.edge(k, j)
            rij = gnfa.edge(i, j)
            loop = kleene_star(rkk)
            gnfa.set_edge(i, j, union(rij, concat(rik, concat(loop, rkj))))
    gnfa.remove_state(k)

union, concat, and kleene_star should canonicalize: drop empty unions, flatten nested stars where safe, remove trivial epsilons. I did not solve regex minimization. I solved “TA can read it.”

Connection to Broader Theory

State elimination proves NFAs and regexes are equivalent without appealing to minimal DFAs and partition refinement. Pedagogically, that matters. It is constructive in the direction students fear.

It also connects to:

Path expressions in graph theory
Algebraic automata theory where languages are solutions to matrix equations
Kleene’s theorem in a form you can touch

Once I saw elimination as “Gaussian elimination but regex,” the course stopped feeling like a bag of unrelated tricks.

Exam War Stories

First semester automata exams loved to ask for regex from a diagram with eight states and one troll epsilon loop. My strategy:

Draw GNFA mentally, five minutes
Pick elimination order, write it down, stick to it
Eliminate one state per page for partial credit
Simplify only at the end unless intermediate labels exceed half a page

Partial credit saved my grade more than elegance. The algorithm rewards showing work.

When Not to Use This

In real compilers and regex engines, you do not convert NFA to regex for fun. You build NFAs from regex and simulate. State elimination is exponential in size in the worst case for the resulting expression. It is a theory tool, not a production pipeline.

But for learning, interviews, and the automata problem your friend swore was “just theory,” it is the algorithm that made the subject click.

The Takeaway

GNFA state elimination is not pretty. It is honest. You watch paths consolidate into algebra until only start and accept remain. No hidden oracle, no pumping lemma hand-waving, just repeated application of one update rule.

Automata theory stopped being memorization for me when I could eliminate a state on a whiteboard and explain every term in the updated label. If you are stuck in the NFA-regex wilderness, learn this algorithm, pick your elimination order carefully, and carry a spare whiteboard marker. You will need it.

GFNA. State elimination. Regex. Finally the same language.

--claps