Generating directory entries

Introduction

Here is a seemingly simple programming problem that actually throws up several interesting issues: write a library procedure that recursively generates the entries in a given directory.

First attempt

This is a simple first attempt. The enclosing main program takes two optional parameters; namely the directory to list, and the maximum number of results to print.

Download dir1.icn

import io

procedure dir_recurse(f)
   local p, f2, l, s
   write("Opening ", f)
   s := DirStream(f) | {
      write("Failed to open ", f, ": ", &why)
      fail
   }
   suspend f
   p := FilePath(f)
   repeat {
      l := s.read_line() | break
      if /l then
         break
      if Files.is_relative_dir(l) then
         next
      f2 := p.child(l).str() 
      if Files.is_directory(f2) then
         suspend dir_recurse(f2)
      else
         suspend f2
   }
   write("Closing ", f)
   s.close()
end

procedure main(a)
   local f, d, n

   d := a[1] | "/tmp"
   n := integer(a[2]) | 100

   every f := dir_recurse(d) \ n do
      write("   Got ", f)

   write("Exiting")
end

This procedure has three problems. Firstly, unless the generator is run to exhaustion, not all of the io.DirStream objects used to read the directories will be closed. This is because the last line of dir_recurse, which closes the DirStream, s, won’t be reached if dir_recurse suspends, but isn’t resumed.

To illustrate this problem, try creating a test directory with some entries as follows :-

mkdir -p /tmp/here/is/a/deep/directory
touch /tmp/here/entry1 /tmp/here/is/a/entry2 /tmp/here/is/a/entry3 /tmp/here/is/a/deep/entry4
chmod -r /tmp/here/is/a/deep/directory

Note that the last command makes that directory unreadable. If we run the command

oit -s ./dir1.icn -x /tmp/here 5

we get the output :-

Opening /tmp/here
   Got /tmp/here
   Got /tmp/here/entry1
Opening /tmp/here/is
   Got /tmp/here/is
Opening /tmp/here/is/a
   Got /tmp/here/is/a
   Got /tmp/here/is/a/entry3
Exiting

Note how none of the three opened DirStreams has a corresponding close.

The second problem is that dir_recurse doesn’t handle errors well. If it can’t open a directory, it just prints a message, and if it encounters an error reading a directory (indicated by s.read_line() failing), it exits its loop. In either case, there is no way for the caller of dir_recurse to tell that an error occurred.

To illustrate this, try running :-

oit -s ./dir1.icn -x /tmp/here

Here we see that the error opening the unreadable directory goes unnoticed by the higher level invocations, which just carry on regardless :-

Opening /tmp/here
   Got /tmp/here
   Got /tmp/here/entry1
Opening /tmp/here/is
   Got /tmp/here/is
Opening /tmp/here/is/a
   Got /tmp/here/is/a
   Got /tmp/here/is/a/entry3
Opening /tmp/here/is/a/deep
   Got /tmp/here/is/a/deep
Opening /tmp/here/is/a/deep/directory
Failed to open /tmp/here/is/a/deep/directory: Permission denied (errno=13)
   Got /tmp/here/is/a/deep/entry4
Closing /tmp/here/is/a/deep
   Got /tmp/here/is/a/entry2
Closing /tmp/here/is/a
Closing /tmp/here/is
Closing /tmp/here
Exiting

The third problem is that the procedure is potentially inefficient if it is handling a very deep directory structure. This is because every result from the deepest directories has to “bubble up” through all the higher level calls. Then, after it is received by the top-level caller, the whole chain of calls has to be resumed to get the next result. This is illustrated in the following diagram, showing what happens when a directory under “/usr/lib” is being iterated over. Every result generated involves a complete circuit of the loop shown by the green and orange arrows.

Second attempt

The second version tries to address the last of the above problems, by using coroutines, instead of nested suspension.

Download dir2.icn

import io, ipl.pdco

procedure dir_recurse1(f)
   local s, p, f2, l
   write("Opening ", f)
   s := DirStream(f) | {
      write("Failed to open ", f, ": ", &why)
      fail
   }

   coact(f)
   p := FilePath(f)
   repeat {
      l := s.read_line() | fail
      if /l then
         break
      if Files.is_relative_dir(l) then
         next
      f2 := p.child(l).str() 
      if Files.is_directory(f2) then
         dir_recurse1(f2)
      else
         coact(f2)
   }

   write("Closing ", f)
   s.close()
end

procedure dir_recurse(f)
   suspend Seq{ dir_recurse1(f) }
end

procedure main(a)
   local f, d, n

   d := a[1] | "/tmp"
   n := integer(a[2]) | 100

   every f := dir_recurse(d) \ n do
      write("   Got ", f)

   write("Exiting")
end

Now dir_recurse() starts off a new coroutine, which recursively descends the directory tree in dir_recurse1(). As each result is encountered, it is sent back to dir_recurse() via a co-expression activation. dir_recurse() then suspends it to the main procedure. It makes use of the ipl.pdco.Seq library procedure, which is a very simple one-liner which turns a co-expression’s activation results back into a results sequence. The result is that the main procedure sees the same results sequence as in the first version.

Here is a diagram showing what is happening. The results now flow along the blue lines, bypassing the lengthy green and orange loop.

Having said all this, a directory would have to be very deep indeed, and have a large number of entries, before the relatively small cost of procedure resumption becomes noticeable against the backdrop of the expensive I/O calls. In fact, for shallow directories, the above technique will probably be slightly slower than the first version (although in this case, the cost difference will almost certainly be irrelevant). For more on the performance advantage of the technique, please see this page.

Third attempt

The next version addresses the problem of ensuring that the DirStream instances are properly closed, even if the results aren’t generated until exhaustion.

Download dir3.icn

import io, util

record Msg(file, revert)

procedure dir_recurse2(s, f, revert)
   local p, f2, l
   coact(Msg(f, &current), revert)
   p := FilePath(f)
   repeat {
      l := s.read_line() | fail
      if /l then
         return
      if Files.is_relative_dir(l) then
         next
      f2 := p.child(l).str() 
      if Files.is_directory(f2) then
         dir_recurse1(f2, revert)
      else
         coact(Msg(f2, &current), revert)
   }
end

procedure dir_recurse1(f, revert)
   local s
   use {
      {
         write("Opening ", f)
         s := DirStream(f)
      },
      dir_recurse2(s, f, revert),
      {
         write("Closing ", f)
         s.close()
      }
   } | write("Problem accessing ", f, ": ", &why)
end

procedure dir_recurse(f)
   local v, here, e
   here := &current
   e := create dir_recurse1(f, here)
   v := @e | fail
   repeat {
      suspend v.file
      v := coact(, v.revert) | fail
   }
end

procedure main(a)
   local f, d, n

   d := a[1] | "/tmp"
   n := integer(a[2]) | 100
   clean {
      every f := dir_recurse(d) \ n do
         write("   Got ", f)
   }

   write("Exiting")
end

It achieves this by using the util.use library procedure. This procedure takes three co-expressions: a setup expression which opens a particular resource (such as a file); an action expression which uses that resource, and finally a cleanup expression which closes the resource. Normally use invokes the cleanup expression itself, after the action expression completes. But if the action expression never does this, as may happen in this case if we don’t generate all the directory entries to exhaustion, then we can use another procedure, util.clean. This works in conjunction with use, by managing a global list of cleanup expressions. use adds each cleanup expression to the list, and removes them if and when they are run. Finally, clean will run any cleanup expressions that use hasn’t.

The call to use is in dir_recurse1 which then calls dir_recurse2 to make use of the DirStream resource. The procedure dir_recurse is still the top-level interface, suspending the sequence of file names to the client program. You will see that it is rather more involved than in the previous version. This is because unfortunately use also happens to introduce a complication. Before, we only had two co-expressions, namely the main co-expression and the one descending the directory tree. This means we didn’t have to keep track of which co-expression each one should activate or revert to; the descending co-routine just reverted to its activator when it had a result (this is what coact doees by default). But now each time we call use we are creating three co-expressions. One of those is activated and calls dir_recurse2, so that procedure can no longer simply revert to its activator to send a result. So instead we have to pass a parameter (revert) down to dir_recurse2, giving it the co-expression we wish to activate when we have a result. Similarly, with each result we have to include the co-expression we wish the top level to resume when we want another result. So rather than just giving a file as a result, we give a record instance, Msg, containing a file and the co-expression to revert to for the next result.

Try running this version on the test directory mentioned before, producing just the first few results :-

oit -s ./dir3.icn -x /tmp/here 5

which produces :-

Opening /tmp/here
   Got /tmp/here
   Got /tmp/here/entry1
Opening /tmp/here/is
   Got /tmp/here/is
Opening /tmp/here/is/a
   Got /tmp/here/is/a
   Got /tmp/here/is/a/entry3
Closing /tmp/here/is/a
Closing /tmp/here/is
Closing /tmp/here
Exiting

In contrast to the first version, all the DirStreams are closed properly.

Final version

This just leaves the problem of error notification to be addressed. The problem is that, since dir_recurse generates a sequence of results, it can’t use failure to indicate an error. One possibility would be to use some kind of flag to indicate an error. But a better way is to use the library procedure exception.throw to raise an exception. This has the added advantage that it is easy to abandon a deeply nested series of calls, which is what we have in this case.

Here is the final version of the program.

Download dir4.icn

import io,  util, exception

record Msg(file, revert)

procedure dir_recurse2(s, f, revert)
   local p, f2, l
   coact(Msg(f, &current), revert)
   p := FilePath(f)
   repeat {
      l := s.read_line() | fail
      if /l then
         return
      if Files.is_relative_dir(l) then
         next
      f2 := p.child(l).str() 
      if Files.is_directory(f2) then
         dir_recurse1(f2, revert)
      else
         coact(Msg(f2, &current), revert)
   }
end


procedure dir_recurse1(f, revert)
   local s
   use {
      {
         write("Opening ", f)
         s := DirStream(f)
      },
      dir_recurse2(s, f, revert),
      {
         write("Closing ", f)
         s.close()
      }
   } | throw("Problem accessing " || f || ": " || &why)
end

procedure dir_recurse(f)
   local v, here, e
   here := &current
   e := create dir_recurse1(f, here)
   v := @e | fail
   repeat {
      suspend v.file
      v := coact(, v.revert) | fail
   }
end

procedure main(a)
   local f, d, n

   d := a[1] | "/tmp"
   n := integer(a[2]) | 100
   clean { try1 {
      every f := dir_recurse(d) \ n do
         write("   Got ", f)
   }} | write(&why)

   write("Exiting")
end

The exception mechanism is actually very simple. At the top level, a procedure exception.try1 is used to wrap the call to dir_recurse. All this does in effect is save the current co-expression (ie, &current) in a global variable, throw_handler. The expression passed to try1 is now invoked. If that expression wants to raise an exception, all it does is call throw with an arbitrary exception message (an error message string in this case). throw simply sets some global variables to save the message and other information, and invokes the handler.

Here is a diagram showing what happens when an exception is raised. try1 invokes the body coroutine passed to it. It then descends several levels of calls to dir_recurse1. Finally, on encountering an error, throw is called, and the handler (set by try1, and stored in throw_handler) is invoked.

This diagram rather simplifies the activities of the descending body coroutine. dir_recurse2 isn’t shown, and, since we are now using use to invoke dir_recurse2, each level of recursion actually invovles a co-expression activation, as well as an ordinary procedure invocation.

There is a similarity between this diagram and the one above showing how the traversal coroutine sends its results to the main coroutine, bypassing the call chain. Indeed both are using a similar technique, and it would be possible to avoid using the exception library and instead send an error notification as part of the Msg record.

The call to try1 is in the main procedure, inside the call to clean, ensuring that all the DirStreams opened by the calls to use are closed in the case of an exception.

To test this final version of the program, try running :-

oit -s ./dir4.icn -x /tmp/here

The output should be :-

Opening /tmp/here
   Got /tmp/here
Opening /tmp/here/is
   Got /tmp/here/is
Opening /tmp/here/is/a
   Got /tmp/here/is/a
Opening /tmp/here/is/a/deep
   Got /tmp/here/is/a/deep
Opening /tmp/here/is/a/deep/directory
Closing /tmp/here/is/a/deep
Closing /tmp/here/is/a
Closing /tmp/here/is
Closing /tmp/here
Problem accessing /tmp/here/is/a/deep/directory: Permission denied (errno=13)
Exiting

Note how the program stops when an error is encountered, and that all DirStreams are properly closed.

Contents