After re-reading some of the papers from Bell Labs, something clicked in my
mind, and I’m hooked. I’m now reading “The UNIX Programming
Environment”,
by Kernighan and Pike. It’s got that fun style you’re probably familiar with, if
you’ve read K&R or the blue book. One of the first exercise
questions, on the chapter on file systems:
(harder) How does the pwd
command operate?
Seems like a fun one. My first guess is that it used $PWD
from the
environment. Let’s test that.
~ % PWD=/usr/local pwd
/home/gg
There’s a non-standard -L
flag that seems to use $PWD
. Maybe that one would
do?
~ % PWD=/usr/local pwd -L
/home/gg
Not that either. Wait a second, what’s pwd
again?
~ % type pwd
pwd is a shell builtin
Hah, so I was calling the wrong one. So I replace pwd
with /bin/pwd
in my
queries above, but the results are the same.
My next hypothesis is that it would somehow expand .
to absolute. I’m not
aware of a UNIX command that performs such an expansion, so I man -k
some
keywords. Nothing.
Maybe pwd(1)
? It’s not terribly descriptive (it’s such a simple utility after
all.) It doesn’t explain the implementation at all, but links me to
getcwd(3)
. Alright, let’s just look at the source.
OpenBSD implementation
int
main(int argc, char *argv[])
{
int ch, lFlag = 0;
const char *p;
/* pledge(), parse flags... */
if (lFlag)
p = getcwd_logical();
else
p = NULL;
if (p == NULL)
p = getcwd(NULL, 0);
if (p == NULL)
err(EXIT_FAILURE, NULL);
puts(p);
exit(EXIT_SUCCESS);
}
Unless -P
is passed, it just calls getcwd
. Let’s see what that “logical”
function does:
static char *
getcwd_logical(void)
{
char *pwd, *p;
struct stat s_pwd, s_dot;
/* Check $PWD -- if it's right, it's fast. */
pwd = getenv("PWD");
puts("PWD found in the ENV");
puts(pwd);
if (pwd == NULL)
return NULL;
if (pwd[0] != '/')
return NULL;
/* check for . or .. components, including trailing ones */
for (p = pwd; *p != '\0'; p++)
if (p[0] == '/' && p[1] == '.') {
if (p[2] == '.')
p++;
if (p[2] == '\0' || p[2] == '/')
return NULL;
}
if (stat(pwd, &s_pwd) == -1 || stat(".", &s_dot) == -1)
return NULL;
if (s_pwd.st_dev != s_dot.st_dev || s_pwd.st_ino != s_dot.st_ino)
return NULL;
return pwd;
}
So -L
does check for $PWD
, but only returns it if it’s pointing to the
same inode, on the same device. You can’t just manually override it to be
anything you want. In that case, it falls back to the libc call to getcwd
.
Makes me wonder what use this -L
flag is in the first place. Maybe it has to
do with symlinks?
/tmp % mkdir one
/tmp % ln -s one two
/tmp % cd one
/tmp/one % /bin/pwd
/tmp/one
/tmp/one % cd ../two
/tmp/two % /bin/pwd
/tmp/one
/tmp/two % /bin/pwd -L
/tmp/two
Makes sense. Anyway, that’s not a very satisfying answer. I doubt the authors'
intended answer would have been “defer to the libc”.
Plan9
Ok, OpenBSD source didn’t help. But Plan9 is Unicibus ipsis Unicior, so maybe
we can find the answer there. Let’s inspect pwd(1)
:
DESCRIPTION
Pwd prints the path name of the working (current) directory.
Pwd is guaranteed to return the same path that was used to
enter the directory. If, however, the name space has
changed, or directory names have been changed, this path
name may no longer be valid. (See fd2path(2) for a descrip-
tion of pwd's mechanism.)
Hah, that was helpful! Now, from fd2path(2)
:
As an example, getwd(2) is implemented by opening . and exe-
cuting fd2path on the resulting file descriptor.
By the way, it turns out that fd2path(2)
is a fascinating topic in its own
right (cf. “Lexical File Names in Plan 9 or: ‘Getting Dot-Dot
Right’").
So my hypothesis above was correct, at least when it comes to Plan9. Also,
another cool thing about Plan9 is that it lets me inspect a folder (“everything
is a file”, right?)
I can then run foo
through hexdump
and see what’s in there.
GNU
Let’s see how coreutils implements
it… nope. Just
nope.
Wrapping up
So that was it, a brief excursion into different implementations of a simple
command in UNIX. The difference in complexity is palpable. The Plan9
documentation is fun to read, and so is the code.