Data structure abstractions

Introduction

Object Icon inherits builtin lists, tables and sets from Icon. These are suitable for almost all purposes, but occasionally something different is needed. The datastruct package provides some object-oriented abstractions which can help.

Fundamental classes

The three most important classes are List, Table and Set. Each is an abstract class with methods which reflect the operations of its builtin counterpart.

Capabilities

Each data structure instance has zero or more of several capabilities, reflected by an integer flag value returned by a get_mode() method. This is the same technique used by io.Stream to report whether a Stream is readable, writable, etc. But in this case the capabilities reflect how (if at all) the particular data structure may be altered. Sets and tables (but not lists) additionally have an ORDER capability, meaning entries are stored in order, and thus will be generated ready-sorted during iteration.

Wrapping builtin instances

Each of the three types of builtin datastructure has two corresponding datastruct wrapper classes. One is a general purpose wrapper providing full access to the underlying data via the particular class’s methods. The other is rather more useful, and provides just a read-only view of the underlying data. For lists, the classes are datastruct.BuiltinList and datastruct.UnmodifiableBuiltinList respectively, with the other classes being similarly named.

To see how the read-only wrapper could be useful, consider a class which has a private builtin list member variable, and wishes to expose its content to callers. Exposing the private variable would of course allow the list’s content to be changed by the caller. Providing a copy of the list might be expensive. Returning a read-only wrapper avoids these problems. For example :-

class Something()
   private l     # A list.

   ...
   public get_data()
      return UnmodifiableBuiltinList(l)
   end
   ...
end

# A user of the above
x := Something()
...
# Examine the elements
l := x.get_data()
every e := l.gen() do {
    ...
}
l.put("Junk")   # Runtime error

One might legitimately object to the overhead of creating many instances of UnmodifiableBuiltinList. At the cost of an extra instance variable this could be avoided :-

class Something()
   private const
      l,     # A list.
      ul     # Its wrapper
   ...
   public get_data()
      return ul
   end

   public new()
      ul := UnmodifiableBuiltinList(l := [])
      ...
      return
   end
end

Note that the use of const ensures that ul and l cannot get out of synch after the instance is instantiated.

Lists

Linked lists

datastruct.LinkedList, implements a Lisp-style singly linked list, so that distinct lists can share tail portions of their data :-

l1 := LinkedList(5, 4, 57, 12, 14)
# The tail list from element 4
l2 := l1.from(4)
# Now l2 is LinkedList(12,14)
# Add an element on the front and back
l2.push(1000)
l2.put(99)
# Now l2 is LinkedList(1000,12,14,99)
# and l1 is LinkedList(5,4,57,12,14,99)

More useful is datastruct.DoublyLinkedList, with several operations possible in constant time, which would take linear time using other types of list. For example, given two lists of arbitrary length, the following splices all of the nodes of the second list onto the end of the first, in constant time, leaving the second list empty :-

l1 := DoublyLinkedList( ... )
l2 := DoublyLinkedList( ... )
l1.splice(0, l2)

The opposite of splicing is “extracting”, which removes part of a list, returning it as a new list.

l1 := DoublyLinkedList( ... )
n1 := l1.node(100); n2 := l1.node(-100)
l2 := DoublyLinkedList.unsafe_extract_nodes(n1, n2)

The last operation is again constant time. Note that the caller has to ensure that the nodes form a valid sub-list, with n2 coming after n1, otherwise both source and output lists will be nonsensical; hence the name of the method. Note also that this is a static method (rather than an instance method of the list); list nodes can in fact be manipulated independently of their enclosing list structure.

One final point to mention about DoublyLinkedList is that, like LinkedList, getting the size is an “O(n)” operation.

Sub-list views

datastruct.SubList is a view of part of another list; it stores no data itself :-

l1 := BuiltinList( [5,4,57,12,14,99,200] )
# A view of l1[4:7]
l2 := SubList(l1, 4, 7)
# Change l1
l1.push(1)
l1.at(5) := 101
# l1 is now BuiltinList(1,5,4,57,101,14,99,200)
# l2 is now SubList(57,101,14)
# Now modify l2 to change l1
l2.at(1) := 300
# l1 is now BuiltinList(1,5,4,300,101,14,99,200)
# l2 is now SubList(300,101,14)

Tables

SortTable

datastruct.SortTable is a Table which implements the ORDER capability described above, using a red-black binary tree :-

t := SortTable()
t.insert("Zebra", 120)
t.insert("Human", 10)
t.insert("Aardvark", 34)
# t is now SortTable("Aardvark"->34;"Human"->10;"Zebra"->120)

Rank

A SortTable can also quickly obtain the “rank” of a particular key, which means its position in the sorted key order. For example

t.rank("Human")

gives 2, since “Human” is the second key in order. Conversely, the select method gives the key for a particular rank :-

t.select(2)

gives “Human”. select uses standard Icon indexing, so t.select(-1) will give the highest ordered key (“Zebra”).

Tree nodes

SortTable has several methods which return references to the nodes in the red-black tree. Each node contains a (read-only) key and a value, which may be freely edited at any time to update the table entry’s value.

A node may be obtained when a new entry is added to the table (datastruct.SortTable.new_node()), by key datastruct.SortTable.find_node(), or by rank datastruct.SortTable.node().

The node may be removed from the tree with datastruct.SortTableNode.unlink().

Iteration

SortTable and SortTableNode provide methods to iterate forward or backward over keys, values and nodes. These iterators are robust against any changes to the underlying tree structure may change, with one exception: the last entry returned must not be deleted before the next entry is generated. For example, this is not allowed :-

   t := SortTable(,, "Cat", 12, "Mouse", 2, "Gerbil", 1)
   # Generate the nodes (in ascending sort order)
   e := create t.nodes()
   n := @e
   # n is now the SortTableNode(key="Cat", val=12)
   n.unlink()
   # t now has two remaining elements
   n := @e       # Runtime error: Node has been unlinked

Since it is often useful to iterate and remove elements in the same loop, the method datastruct.SortTableNode.move_unlink() may be used instead, together with a while loop. For example, the following procedure will iterate over a table and delete even-numbered keys :-

procedure del(t)
   local n
   # Get the first node by rank
   n := t.node(1)
   while
      n := if n.key % 2 = 0 then
         n.move_unlink(1)
      else
         n.get_next()
end

EqTable

datastruct.EqTable uses the lang.equals procedure (and its cousin, lang.hash) to determine table membership rather than the usual identity operator (===), which is quite often useful. For example :-

# A table with two entries.
t := EqTable(,
             [1], 1,
             [2], 2)
# Modify the second entry's value.
t.insert([2], 200)
# Now we still have two entries ( [1]->1 and [2]->200 )
# (with a regular table we would have 3).

WeakrefTable

datastruct.WeakrefTable wraps its keys in weak references. Thus on a garbage collection, the keys may “disappear”, and the size of the table reduce :-

# Set up some keys.
k1 := []; k2 := []; k3 := []; k4 := []
# Create a table with four entries.
t := WeakrefTable(, k1, 1,
                    k2, 2,
                    k3, 3,
                    k4, 4)
# Make two of the keys only referenced by the table.
k2 := k4 := &null
# Do a garbage collection
collect()
# Now t is reduced
t.size()     # 2 (k1->1 and k3->3)

Sets

The functionality of any Table can easily be extended into Set form using datastruct.TableSet, which provides a set view of the keys. datastruct.EqSet is a particularly convenient subclass of TableSet which wraps the EqTable described above :-

# A three-element set
s := EqSet( [1], [2], [3] )
# Insert an element
s.insert([2])
# The size is still 3 (not 4, as a regular set would have).

Contents