Processing math: 100%

Friday, January 4, 2013

This Week's Coding Practice: Linked Lists

Linked lists are elementary data structures that any programmer worth their salt should know back to front.  In contrast to the contiguous array, linked lists naturally grow or contract as the data stored in them increases or decreases, respectively.  Furthermore, the memory occupied by the data does not need to be allocated contiguously.  However, this flexibility comes at a price.  Accessing random elements in constant time is no longer possible -- with a linked list, accessing the nth element takes O(n) time.

There are many ways to skin a cat, and there are also many ways to implement linked lists.  Some important decisions to make along the way include determining the degree of abstraction; picking between singly and doubly-linked lists; and picking between sentinel nodes and sentinel values.  I'll discuss each decision briefly below.

At a high-level, a linked list can be defined recursively: a linked list is a node followed by a linked list.  You have at most two entities to worry about -- the Node, which contains the data you want to store and a pointer to the rest of the linked list, and the List itself.  Ideally, you want to model the list as a "black box" and abstract away the fact that it consists of a chain of Nodes.  This requires defining and implementing the List and Node explicitly as separate data types (as separate structs, for example).  Frameworks, like the C++ STL's vector or Python's list(), do this for you.  Home-brew implementations of the linked list often avoid defining an explicit data type for the list itself, and model the list implicitly as a chain of nodes (see how I implemented Bins in a hash table a few weeks ago).  This saves the developer the effort of having to define separate List and Node data types, at the expense of reduced abstraction.  If the particular linked list implementation is only used internally (as opposed to being part of a framework), then the loss of abstraction is not a big deal.

List linking comes in two flavors: single and double.  A node of singly-linked lists contains a pointer to the next node only.  A node of a doubly-linked list contains pointers to both the previous and the next nodes.  Singly-linked lists are the simplest and most flexible -- so flexible, in fact, that they can be "abused" by chaining lists together in a tree-like structure.  Again, this flexibility comes at a price.  Updating (inserting and removing elements) singly-linked lists can be a pain in the ass, since such operations require access to the node that is before the update location.  Since singly-linked lists can only be traversed in one direction (from the head to the tail), locating this previous node requires a full traversal from the head of the list -- which can cost O(n).  Doubly-linked lists allow the previous node to be determined directly, at the expense of increased storage overhead, due to the extra pointer in each node, and reduced flexibility -- abusing the list as a tree is no longer possible.

All good things come to an end, and linked lists are no exception (OK, with the exception of circular lists...).  Primarily, there are two ways of indicating the end of a list: sentinel values and sentinel nodes.  Sentinel values are special values like NULL that indicate that the previous or next node doesn't exist.  It's a simple idea, but requires additional conditional code to handle search and retrieval operations near the head and tail of the list.  Sentinel nodes simplify the code by removing such conditionals, at the expense of adding nodes that do not hold any real data.

How one goes about making the above-mentioned three decisions depends on the application -- what will you be using the linked list for?  This week, the tasks were to implement a simple singly-linked list and:
  1. Write a function to remove a list's 3rd from last element. Challenge: can you do it in a single list traversal?
  2. Write a function to remove all duplicates from a linked list. Challenge: can you do it without storing any extra data?
  3. Write a function to detect a loop in a linked list.
For the abstraction level, I decided to roll with abstracting away the Node and exposing only the top-level List.  This looks a bit awkward at first, since all the List is is just a pointer to the first Node.  However, it does simplify the representation of border cases, such as empty lists.  I also decided not to use sentinels this time, as they would be more trouble than they are worth in some cases.  Specifically, removing duplicates from a list without storing any extra data requires pre-sorting the list with an algorithm such as merge-sort.  If we're using sentinels, then the divide step of merge-sort would require us to create additional sentinel nodes, and the merge step would require us to delete those additional sentinel nodes.

Finally, here's the code for this week:

#include <stdlib.h>
#include <assert.h>
#include <stdio.h>
struct Node
{
int value;
struct Node *next;
};
struct List
{
struct Node *head;
};
struct List *
list_new()
{
struct List *list = calloc(sizeof(struct List), 1);
assert(list);
return list;
}
struct Node *
node_new(int value)
{
struct Node *node = calloc(sizeof(struct Node), 1);
assert(node);
node->value = value;
return node;
}
void
list_append(struct List *list, int value)
{
assert(list);
struct Node *node = node_new(value);
assert(node);
if (!list->head)
{
list->head = node;
}
else
{
struct Node *current = list->head;
for (; current->next != NULL; current = current->next);
current->next = node;
}
}
void
list_delete(struct List *list)
{
assert(list);
struct Node *current = list->head;
while (current)
{
struct Node *tmp = current;
current = current->next;
free(tmp);
}
free(list);
}
int
list_remove(struct List *list, int value)
{
assert(list);
if (!list->head)
return 0; /* empty list */
if (list->head->value == value)
{
list->head = list->head->next;
return 1;
}
else
{
struct Node *current = list->head->next;
struct Node *prev = list->head;
while (current)
{
if (current->value == value)
{
prev->next = current->next;
free(current);
return 1;
}
current = current->next;
}
return 0;
}
}
int
remove_third_last(struct List *list)
{
assert(list);
struct Node *lead = list->head, *follow = list->head, *prev = NULL;
int i;
for (i = 0; i < 3 && lead; ++i)
lead = lead->next;
if (i != 3)
return 0; /* list is shorter than 3 elements */
while (lead) /* go forward until lead reaches the end of the list */
{
lead = lead->next;
prev = follow;
follow = follow->next;
} /* follow now points at the element we want to delete */
if (follow == list->head)
list->head = follow->next;
else
prev->next = follow->next;
free(follow);
return 1;
}
int
detect_loop(struct List *list)
{
assert(list);
struct Node *fast = list->head, *slow = list->head;
if (fast)
fast = fast->next;
else
return 0; /* empty list */
while (fast && slow)
{
if (fast == slow)
return 1; /* loop detected */
fast = fast->next;
if (!fast)
break;
else
fast = fast->next;
slow = slow->next;
}
return 0;
}
void
merge_sort_helper(struct Node **head)
{
if (*head == NULL)
return; /* empty list */
/* break list into left and right sublists */
struct Node *fast = (*head)->next;
struct Node *slow = *head;
while (fast && fast->next)
{
fast = fast->next->next;
slow = slow->next;
}
struct Node *left = *head;
struct Node *right = slow->next;
if (!right)
return; /* list of size 1 */
slow->next = NULL;
/* recursive step */
merge_sort_helper(&left);
merge_sort_helper(&right);
/* merge step */
struct Node *new_head = NULL;
struct Node *tail = NULL;
for (;;)
{
/* determine next element to merge in, if any are left */
struct Node *next = NULL;
if (left && right)
next = left->value < right->value ? left : right;
else if (left)
next = left;
else if (right)
next = right;
else
break;
assert(next);
/* append to linked list */
if (new_head)
tail->next = next;
else
new_head = next;
/* advance pointers */
if (next == left)
left = left->next;
else
right = right->next;
tail = next;
tail->next = NULL;
}
*head = new_head;
}
/* in-place merge sort of the list */
void
merge_sort(struct List *list)
{
merge_sort_helper(&list->head);
}
void
remove_duplicates(struct List *list)
{
assert(list);
merge_sort(list);
struct Node *current = list->head;
struct Node *prev = NULL;
while (current)
{
if (prev && prev->value == current->value)
{
prev->next = current->next;
free(current);
}
prev = current;
current = current->next;
}
}
void
print_list(struct List *list)
{
assert(list);
struct Node *current = list->head;
printf("[");
for (; current; current = current->next)
printf("%d, ", current->value);
printf("]\n");
}
int
main(void)
{
struct List *list = list_new();
int i;
for (i = 0; i < 10; ++i)
list_append(list, i);
print_list(list);
remove_third_last(list);
print_list(list);
printf("detect_loop: %d\n", detect_loop(list));
list->head->next->next->next = list->head;
printf("detect_loop: %d\n", detect_loop(list));
/* there's a loop in that list so we can't reliably delete it... */
//list_delete(list);
list = list_new();
for (i = 0; i < 10; ++i)
list_append(list, 10-i);
print_list(list);
merge_sort(list);
print_list(list);
list_delete(list);
list = list_new();
for (i = 0; i < 10; ++i)
{
list_append(list, 10-i);
list_append(list, i);
}
print_list(list);
remove_duplicates(list);
print_list(list);
return 0;
}
view raw linkedlist.c hosted with ❤ by GitHub
CFLAGS=-ggdb -Wall
all: linkedlist.out
# $< the dependencies
# $@ the target
linkedlist.out: linkedlist.o
gcc -Wall -ggdb < -o @
clean:
rm -rf *.out *.o
view raw Makefile hosted with ❤ by GitHub

No comments:

Post a Comment