Discussion:
how to portably and atomically write a file?
(too old to reply)
Julieta Shem
2024-01-20 10:13:25 UTC
Permalink
I need multiple processes to write files to the same directory, so I
need to atomically write. I found uiop's call-with-temporary-file and I
wrote the following.

(defun save-file (data)
(let ((tmp (uiop:call-with-temporary-file
(lambda (s p)
(write-sequence data s)
p)
:directory (merge-pathnames ".")
:keep t)))
(rename-file tmp "final-name.txt")))

That's just a prototype. The final-name will be calculated at run time.
The problem I have is that rename-file overwrites an existing file. I
need it not to do that so I can try again with a different name if the
choice of final name was already taken.

How do you guys do this? Thank you.
Spiros Bousbouras
2024-01-20 15:57:24 UTC
Permalink
On Sat, 20 Jan 2024 07:13:25 -0300
Post by Julieta Shem
I need multiple processes to write files to the same directory, so I
need to atomically write. I found uiop's call-with-temporary-file and I
wrote the following.
(defun save-file (data)
(let ((tmp (uiop:call-with-temporary-file
(lambda (s p)
(write-sequence data s)
p)
:directory (merge-pathnames ".")
:keep t)))
(rename-file tmp "final-name.txt")))
That's just a prototype. The final-name will be calculated at run time.
The problem I have is that rename-file overwrites an existing file.
I couldn't see anything in the CLHS which covers what happens if the file
already exists.
Post by Julieta Shem
I
need it not to do that so I can try again with a different name if the
choice of final name was already taken.
How do you guys do this? Thank you.
Does "portably" mean without using anything outside Common Lisp ? If yes ,
then the only way I can think of is to get from somewhere a source of
random bytes (/dev/urandom on Linux) and each process will use this to
calculate a unique file name. It is not 100% certain that you will get
a unique file name but if you read enough bytes , then the probability
of getting twice the same sequence of bytes is so small that for practical
reasons you can treat it as 0.

If you are willing to use the FFI then you do it in the same way(s) you would
do it using C for your operating system.

On Linux/Unix I can think of 2 ways :

1. Each process atomically creates a lock file with a fixed name under the
directory , then calculates a file name , verifies that the directory does
not already have a file with this name , creates the file and then removes
the lock file.

2. Append to some fixed file name (pattern) the process ID which the
operating system guarantees is unique.

A possible 3rd way depends on how the processes get launched. If they all get
launched by the same process then that process first calculates a unique file
name for each process it launches and then passes it as a command line
argument or an environment variable or something.
--
vlaho.ninja/menu
Julieta Shem
2024-01-20 17:27:24 UTC
Permalink
Post by Spiros Bousbouras
On Sat, 20 Jan 2024 07:13:25 -0300
Post by Julieta Shem
I need multiple processes to write files to the same directory, so I
need to atomically write. I found uiop's call-with-temporary-file and I
wrote the following.
(defun save-file (data)
(let ((tmp (uiop:call-with-temporary-file
(lambda (s p)
(write-sequence data s)
p)
:directory (merge-pathnames ".")
:keep t)))
(rename-file tmp "final-name.txt")))
That's just a prototype. The final-name will be calculated at run time.
The problem I have is that rename-file overwrites an existing file.
I couldn't see anything in the CLHS which covers what happens if the file
already exists.
Post by Julieta Shem
I need it not to do that so I can try again with a different name if
the choice of final name was already taken.
How do you guys do this? Thank you.
Does "portably" mean without using anything outside Common Lisp ?
I just meant that ideally it would work on typical Unix systems
including Windows. I'm okay with a solution that only works on the
typical Unix system.
Post by Spiros Bousbouras
If yes , then the only way I can think of is to get from somewhere a
source of random bytes (/dev/urandom on Linux) and each process will
use this to calculate a unique file name.
[...]

I didn't express myself properly. What I need is an atomic operation to
let me know if a certain name is already taken and, if it is not taken,
take it. As far as I know on Unix sytems, this operation is rename. It
is atomic, but it must be stopped if it's already taken. (I'm afraid
rename is not atomic on Windows. But I'm totally willing to let Windows
go.)

I'm glad to know I can write a replacement for rename-file, but it
sounds strange to me that nobody writing Common Lisp ever needed this
before. Perhaps I'm writing C in Common Lisp.
Lawrence D'Oliveiro
2024-01-20 21:22:18 UTC
Permalink
Post by Julieta Shem
What I need is an atomic operation to
let me know if a certain name is already taken and, if it is not taken,
take it.
Open with O_EXCL <https://manpages.debian.org/2/open.2.html>.
Julieta Shem
2024-01-20 22:13:59 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Julieta Shem
What I need is an atomic operation to
let me know if a certain name is already taken and, if it is not taken,
take it.
Open with O_EXCL <https://manpages.debian.org/2/open.2.html>.
I don't understand. You mean I should write it in C?
Alan Bawden
2024-01-20 22:48:22 UTC
Permalink
Post by Lawrence D'Oliveiro
Post by Julieta Shem
What I need is an atomic operation to
let me know if a certain name is already taken and, if it is not taken,
take it.
Open with O_EXCL <https://manpages.debian.org/2/open.2.html>.
I don't understand. You mean I should write it in C?

If POSIX open() with O_EXCL really is a solution to your problem, then
you can do the same thing in pure Common Lisp by calling OPEN with
:IF-EXISTS :ERROR.

But you originally asked for a version of RENAME-FILE that renames
atomically but fails if the target already exists. If that's _really_
what you want, then even POSIX won't help you since POSIX rename()
always deletes the target if it exists.

If we knew what problem you were really trying to solve, then we might
be able to help you, but as it is, we're all just throwing ideas at you
in the hopes that something will stick.

- Alan
Lawrence D'Oliveiro
2024-01-20 23:00:02 UTC
Permalink
... even POSIX won't help you since POSIX rename()
always deletes the target if it exists.
Notice also the new POSIX calls openat, renameat etc, which help to guard
against certain kinds of TOCTOU attacks using symlinks.

Linux also adds renameat2 <https://manpages.debian.org/2/rename.2.html>,
which lets you do some extra clever things, like exchanging file names,
and returning an error if a file with the new name already exists.
Julieta Shem
2024-01-20 23:25:57 UTC
Permalink
Post by Julieta Shem
Post by Lawrence D'Oliveiro
Post by Julieta Shem
What I need is an atomic operation to
let me know if a certain name is already taken and, if it is not taken,
take it.
Open with O_EXCL <https://manpages.debian.org/2/open.2.html>.
I don't understand. You mean I should write it in C?
If POSIX open() with O_EXCL really is a solution to your problem, then
you can do the same thing in pure Common Lisp by calling OPEN with
:IF-EXISTS :ERROR.
But you originally asked for a version of RENAME-FILE that renames
atomically but fails if the target already exists. If that's _really_
what you want, then even POSIX won't help you since POSIX rename()
always deletes the target if it exists.
If we knew what problem you were really trying to solve, then we might
be able to help you, but as it is, we're all just throwing ideas at you
in the hopes that something will stick.
I'm sorry if I didn't express myself properly. I'm writing an NNTP
service that stores articles in a directory. Each user is served by a
different service-process. When a user POSTs an article (POST is a verb
in the NNTP protocol), the service will write a new file in a directory
relative to the group. Two or more users could be posting at the same
time, so the strategy I came up with is to atomically write the article
by way of using a temporary file and renaming to the final name.

The final name is a string-integer that reflects the internal id of the
article---it's the greatest article number currently present plus one.
If two posts happen near in time, one will take the next id ahead of the
other, so the renaming of the second one would fail (or so I wished)
because a file with that name already exists, so the process can react
by increasing the file number once again, repeating that how many times
is necessary until it finds a free greatest id available. (I suppose
this might not be called an /algorithm/, but I guess in practice it
would work.)

I'm interested in any solution. Thank you.

I think the open-with-:IF-EXISTS-:ERROR strategy would not be good: that
implies I'd create the file with its final name, but then the service
could end up serving that not-yet-written article to another user who is
reading the group.
Alan Bawden
2024-01-21 00:34:14 UTC
Permalink
Sounds like you're trying to get the file system to solve too many
problems for you at the same time. I'd try separating the assignment of
sequential article numbers from the problem of choosing file names.

I've seen many systems that use file system directories as queues like
this, and I _think_ that I've never seen one that doesn't eventially
wind up using some kind of locking scheme to make something work.
Perhaps the presence of a magic file named "LOCK" means that the
directory is write-locked while some sensitive operation takes place.
Or perhaps something using POSIX's lockf().

Also don't write off the possibility that an actual database like SQLite
can help you out. SQLite is so ubiquitous so there is probably an
interface for it from whatever Common Lisp implementation you are using.

- Alan
Lawrence D'Oliveiro
2024-01-21 00:35:35 UTC
Permalink
Post by Alan Bawden
Also don't write off the possibility that an actual database like SQLite
can help you out.
I was going to suggest a DBMS, too. But I think a multi-user one might be
more appropriate in this case.
Julieta Shem
2024-01-21 14:15:00 UTC
Permalink
Post by Alan Bawden
Sounds like you're trying to get the file system to solve too many
problems for you at the same time. I'd try separating the assignment of
sequential article numbers from the problem of choosing file names.
I've seen many systems that use file system directories as queues like
this, and I _think_ that I've never seen one that doesn't eventially
wind up using some kind of locking scheme to make something work.
Perhaps the presence of a magic file named "LOCK" means that the
directory is write-locked while some sensitive operation takes place.
Or perhaps something using POSIX's lockf().
I think there's no escape from some locking because we have multiple
processes that need to write to a central database. I did not think
much when I decided for this---I'm writing this mainly for learning
Common Lisp. Initially I thought the atomic-via-rename would just do
it. I did not see that the serial-numbering of articles would be a
further obstacle.

But you gave me the following idea that I think will work. Let's use
open with :IF-EXISTS :ERROR with the addition that when the server
delivers articles to a client, it ignores files whose names end with
.tmp. Those are articles not completely written.
Post by Alan Bawden
Also don't write off the possibility that an actual database like SQLite
can help you out. SQLite is so ubiquitous so there is probably an
interface for it from whatever Common Lisp implementation you are using.
I'm using SBCL. There's /sqlite/ in Quicklisp. As a second
version---or even a first if I end up facing more obstacles---, it's
probably a very good idea to use SQLite.

Joerg Mertens
2024-01-20 21:51:26 UTC
Permalink
Post by Julieta Shem
That's just a prototype. The final-name will be calculated at run time.
The problem I have is that rename-file overwrites an existing file. I
need it not to do that so I can try again with a different name if the
choice of final name was already taken.
Don't know if this helps but GNU clisp throws an error if the target
file already exists.
Julieta Shem
2024-01-20 22:14:41 UTC
Permalink
Post by Joerg Mertens
Post by Julieta Shem
That's just a prototype. The final-name will be calculated at run time.
The problem I have is that rename-file overwrites an existing file. I
need it not to do that so I can try again with a different name if the
choice of final name was already taken.
Don't know if this helps but GNU clisp throws an error if the target
file already exists.
Interesting. It's one implementation where this would be possible.
Loading...