\def\title{The Algorithm of Hierholzer}
\let\K=\leftarrow
\let\E==
% DIN A4 paper
\pagewidth=210mm\advance\pagewidth by -2in
\pageheight=297mm\advance\pageheight by -2.3in
\fullpageheight=297mm\advance\fullpageheight by -2in
\setpage
\ifpdf
\pdfpagewidth=210mm
\pdfpageheight=297mm
\else
\special{papersize=210mm,297mm}
\fi
@* Introduction.
This program was written in May~2016 by Robert
Clausecker\footnote*{\tt Robert Clausecker } for the
{\it Digitale Systeme} class at Humboldt University of Berlin. The
program is written in the {\it Literate Programming} idiom using the
\.{CWEB} system for structured documentation of Donald E. Knuth at
Stanford University. From the source file \.{hierholzer.w}, a \CEE/
program \.{hierholzer.c} is created using the \.{CTANGLE} program.
At the same time, the program can be rendered into a plain \TEX/
document using \.{CWEAVE}.
@^Clausecker, Robert@>
@^Knuth, Donald Ervin@>
@^Literate Programming@>
@.CWEB@>
@:TeX}{\TEX/@>
Please be careful to compile the program as {\mc C99}. Correct
function cannot be ensured when compiled in other dialects like
{\mc GNU89}. On a {\mc POSIX} system, the invocation~(1) is
suitable.
$$\hbox{\tt c99 -o hierholzer hierholzer.c}\eqno{(1)}$$
@:POSIX}{\mc POSIX@>
@ The program computes Eulerian paths in undirected graphs. A path, if
it exists, is computed using the algorithm of Hierholzer. The
implementation admits parallel edges.
@^Hierholzer, Carl@>
@^Euler, Leonhard@>
The program's complexity is linear in the sum of the number of edges and
the declared number of vertices in the input. The dependency on the
number of vertices comes from the need to create the graph structure in
memory. This term could be eliminated by choosing a more sophisticated
data structure for the graph but it does not matter except for
degenerate cases where not all vertices are connected by edges.
@^complexity@>
@ Input is provided as a list of decimal integers separated by white
space. The first number is |node_count|, the number of vertices in the
graph. After that, the edges of the graph are described as pairs of
integers from |0| to |node_count-1|.
If an Eulerian path can be found it is printed as a white space
separated list of the path's nodes terminated with a newline. If no
path was found, |"-1\n"| is printed instead.
@ The source code loosely follows the stylistic guidelines of the
{\mc OpenBSD} kernel, described in manual page \&{style}(9). For
readability, the |edge|, |node| and |allocator| types are emphasized and
some operators are displayed differently from how they would appear in
\CEE/.
\medskip
\centerline{\tabskip 2em\vbox{\halign{\hfil\.{#}&#\hfil\cr
\omit\hfil in \CEE/&\omit in \.{CWEB}\hfil\cr
\noalign{\smallskip}
A = B, A == B, A != B&|A=B,A==B,A!=B|\cr
A < B, A > B, A <= B, A >= B&|A**B,A<=B,A>=B|\cr
A.B, A->B, A[B]&|A.B, A->B, A[B]|\cr
A \&\& B, A \char'174\char'174\hskip0.5em B, A \& B, A \char'174\hskip0.5em B, A \^ B&$A\W B,A\V B,A\AND B,A\OR B,A\XOR B$\cr
A << B, A >> B&|A<****>B|\cr
A + B, A - B, A * B, A / B, A \% B&|A+B,A-B,A*B,A/B,A%B|\cr
+A, -A, \~A, !A, ++A, --A&|+A,-A,~A,!A,++A,--A|\cr
NULL&|NULL|\cr}}}
\medskip
@:OpenBSD}{\mc OpenBSD@>
@^style@>
@^operators@>
@.CWEB@>
@.NULL@>
@:lambda}{$\NULL$ (\.{NULL})@>
@:style(9)}{\&{style}(9)@>
@f edge int
@f node int
@f uintptr_t int
@f allocator int
@* Program Structure.
Apart from custom functions, only the \CEE/ standard library is used.
If an error occurs in one of the steps, the program is terminated
prematurely. Great care has been taken to ensure that all resources are
released regardless of where the program terminates.
@c
#include
#include
#include @#
@@;
@@;
extern int
main(int argc, char *argv[])
{
FILE *input_file = NULL;
struct allocator alloc = {0, BUCKET_SIZE, NULL};
struct edge path = {NULL, NULL};
node *graph = NULL;
size_t node_count = 0;
int status = EXIT_FAILURE;
@@;
@@;
status = EXIT_SUCCESS;
@@;
@@;
@@;
@@;@#
cleanup:
@@;
return (status);
}
@ The graph is represented as an adjacency list. The list is an array
of vertices, each of which has a linked list of edges. During
the algorithm, edges are gradually removed from the edge lists and added
to |path|. The vertices are identified through their
position in the |graph| array using the |node_num| macro.
@^vertex@>
@d node_num(n) n - graph
@=
typedef struct edge *node;
@ Each edge is a pair of |struct edge|. Initially, |partner| points to
the other node connected to the edge and |next| points to the next edge
in the edge list. When an edge is removed from the graph, one half is
inserted into the Eulerian cycle with |next| pointing to the next edge
in the cycle. The other half is marked as ``removed'' by setting
|partner = NULL|. The Eulerian path is represented by a dummy-edge
|path| whose |next| is the first edge and |partner| is the node the path
starts out from.
A pair of |struct edge| are always allocated together. Some times, we
need to find the other half of a pair of |struct edge|. To do this we
use a trick: Let |s=sizeof (struct edge)|, then |s| is of the form
$a\cdot2^b$ and |@t$2^b$@>==s&-s|. Thus in a pair of |struct edge| one
member is aligned to $2^{b+1}$ while the other is aligned only to $2^b$.
When we set things up such that the first member is always aligned to
$2^{b+1}$, we can find if it is the first in a pair by checking if bit
$b$ is set.
@d EDGE_ALIGNMENT sizeof (struct edge) & -sizeof (struct edge)
@d is_second_half(e) ((uintptr_t)e & EDGE_ALIGNMENT)
@d other_half(e) is_second_half(e) ? e - 1 : e + 1
@=
struct edge {
struct edge *next;
node *partner;
};
@* Memory Management.
In order to quickly allocate and release many edges and to ensure the
alignment constraints of |struct edge| we use a custom allocator
|alloc|. The allocator allocates edges in buckets of |BUCKET_SIZE|
each, keeping a table of all allocated buckets in |buckets|. The
members |bucket_count| and |fill| keep track of how many buckets we
have and how full the last bucket is, respectively.
@d BUCKET_SIZE 0200000 /* $64\cdot1024$ entries per bucket */
@=
struct allocator {
size_t bucket_count, fill;
struct edge **buckets;
};
@ If the first entry in the bucket is not suitably aligned, throw it
away. By construction, the second entry will.
@=
{
struct edge **new_buckets = realloc(alloc.buckets, ++alloc.bucket_count * sizeof *alloc.buckets);
if (new_buckets != NULL)
alloc.buckets = new_buckets;
else {
perror("allocation error");
goto cleanup;
}
if (alloc.buckets[alloc.bucket_count - 1] = malloc(BUCKET_SIZE * sizeof **alloc.buckets), alloc.buckets[alloc.bucket_count - 1] == NULL) {
perror("allocation error");
goto cleanup;
}
alloc.fill = is_second_half(alloc.buckets[alloc.bucket_count - 1]) ? 1 : 0;
}
@ The variable |epair| is assumed to be declared outside. Note that
|alloc.fill| starts out at |BUCKET_SIZE| to do the right thing on the
first edge we allocate.
@=
if (alloc.fill > BUCKET_SIZE - 2)
@@;
epair = alloc.buckets[alloc.bucket_count - 1] + alloc.fill;
alloc.fill += 2;
@ Walk |alloc.buckets|, then free |alloc.buckets|.
@=
{
size_t i;
for (i = 0; i < alloc.bucket_count; i++)
free(alloc.buckets[i]);
free(alloc.buckets);
}
@ A label |cleanup| is placed just before this section so we can go
there to end the program early.
@=
if (input_file != NULL) fclose(input_file);
@@;
free(graph);
@* Input and Output.
The following sections implement input parsing and formatted output.
Before input can be parsed, the input file must be opened. If no input
file was provided a helpful message is printed instead.
@=
if (argc != 2) {
fprintf(stderr, "usage: %s file\n", argv[0]);
goto cleanup;
}
if (input_file = fopen(argv[1], "r"), input_file == NULL) {
perror("cannot open input file");
goto cleanup;
}
@ We read the input as described earlier and construct the graph in
memory. The number of nodes in the graph is stored in |node_count|.
@=
{
size_t from, to;
int items;
if (fscanf(input_file, "%zu\n", &node_count) != 1) {
if (ferror(input_file))
perror("input error");
else
fprintf(stderr, "input malformed\n");
goto cleanup;
}
if (graph = calloc((size_t)node_count, sizeof *graph), graph == NULL) {
perror("allocation error");
goto cleanup;
}
while (items = fscanf(input_file, "%zu %zu\n", &from, &to), items == 2) {
if (from >= node_count || to >= node_count) {
fprintf(stderr, "input malformed\n");
goto cleanup;
}
@@;
}
if (items != EOF) { /* $\\{items}\in\{0,1\}$ if |fscanf| did not parse both |from| and |to| */
fprintf(stderr, "input malformed\n");
goto cleanup;
}
if (ferror(input_file)) {
perror("input error");
goto cleanup;
}
}
@ To output the path we follow the |next| pointers of |path| and print
the vertices we encounter.
@=
{
struct edge *edg;
printf("%tu", node_num(path.partner));
for (edg = path.next; edg != NULL; edg = edg->next)
printf(" %tu", node_num(edg->partner));
putchar('\n');
}
@* Graph Algorithms.
Both parts of the edges are hooked into the beginnings of their
edge lists.
@=
{
struct edge *epair;
@@;
epair[0].next = graph[from];
epair[0].partner = graph + to;
graph[from] = epair + 0;@#
epair[1].next = graph[to];
epair[1].partner = graph + from;
graph[to] = epair + 1;@#
}
@ Only graphs with at most two vertices of odd degree can have Eulerian
paths. The converse is not true: A disconnected graph might not have
vertices of odd degree but it cannot have an Eulerian path.
This snippet finds out how many nodes of odd degree exist,
assigns |path.partner| to one of the two odd nodes if exactly two nodes
have odd degree or prints |"-1\n"| for ``no Eulerian path'' if more
than two nodes have odd degree. If no nodes of odd degree are found,
a random node is picked for |path.partner|. If no node exists, the path
is empty and the program terminates prematurely.
@=
{
struct edge *edg;
size_t i, degree, odd_count = 0;
for (i = 0; i < node_count; i++) {
for (degree = 0, edg = graph[i]; edg != NULL; edg = edg->next)
degree++;
if (degree % 2 != 0) {
if (++odd_count <= 2)
path.partner = graph + i;
else {
printf("-1\n");
goto cleanup;
}
}
}
if (path.partner == NULL)
path.partner = nonempty_node(graph, node_count);
if (path.partner == NULL) {
printf("\n");
goto cleanup;
}
}
@ Find a node with at least one edge, return |NULL| if none exists.
@=
static node *nonempty_node(node *graph, size_t node_count)
{
size_t i;
for (i = 0; i < node_count; i++)
if (graph[i] != NULL)
return (graph + i);
return (NULL);
}
@ This is the rough structure of Hierholzer's algorithm. The algorithm
alternately expands the current path by inserting a new subcycle between
|chaser| and |chaser->next| and ``chases'' the path towards its end by
advancing |chaser| until a node with edges remaining or the end is
reached. When |chaser| reaches the end of the path the algorithm is
complete.
@^Hierholzer, Carl@>
@=
{
struct edge *chaser = &path;
do {
@next|@>@;
@@;
} while (chaser != NULL);
}
@ In this snippet a new path is assembled beginning from
|chaser->partner| until it gets stuck somewhere. This can only happen
if we are back were we came from or when this is the first path we add
in which case |chaser_next| is |NULL|.
@next|@>=
{
struct edge *leader, *chaser_next = chaser->next;
for (leader = chaser; *leader->partner != NULL; leader = leader->next) {
leader->next = *leader->partner;
@next| from the graph@>@;
}
leader->next = chaser_next;
}
@ Unhook |leader->next| from its edge list and mark
|other_half(leader->next)| as removed by setting |partner = NULL|. No
memory is deallocated. This function assumes that |leader->next| is the
first edge in its edge list which always holds when this snippet runs.
The |normalize| macro advances |n|'s edge list to skip removed edges so
that |n==NULL| when the node is empty or points to an actual edge.
@d normalize(n) while(n != NULL && n->partner == NULL) n = n->next
@next| from the graph@>=
{
struct edge *back = other_half(leader->next);
*back->partner = leader->next->next;
normalize((*back->partner));
back->partner = NULL;
normalize((*leader->next->partner));
}
@ If we reached |NULL| and have not seen a vertex with at least one
edge, the path is complete.
@=
while (chaser != NULL && *chaser->partner == NULL)
chaser = chaser->next;
@ Even if all vertices have even degree the node may not have an
Eulerian path if it is not connected. In this case an Eulerian path
is seemingly found but some edges remain in the graph.
@=
if (nonempty_node(graph, node_count) != NULL) {
printf("-1\n");
goto cleanup;
}
@* References and Index.
\def\bitem{\item{$\bullet$}}
\bitem Carl Hierholzer, {\it \"Uber die M\"oglichkeit, einen Linienzug
ohne Wiederholung und ohne Unterbrechung zu umfahren,} in
{\it Mathematische Annalen,} vol.~VI., no.~1, p.~30--32, March~1873.
\bitem The {\mc OpenBSD} Kernel Developer's Manual, {\it Kernel source
file style guide,} {\bf style}(9).
\bitem {\mc ISO/IEC JTC \rm 2/\mc SC \rm 22: \mc ISO/IEC \rm 9899:2011,
\it ``Programming Languages -- C''}, The International Organization
for Standardization.
@!@^Hierholzer, Carl@>
@!@:ISO 9899:2011}{{\mc ISO} 9899:2011@>
@!@:OpenBSD}{\mc OpenBSD@>
@ This index contains all identifiers as well as some other keywords.
The numbers refer to sections, not pages. The section with the
definition of an identifier is underlined.
**