Discussion:
CMUCL, green threads & SBCL
(too old to reply)
Thibault Langlois
2020-04-05 12:29:30 UTC
Permalink
Hello, once upon a time I was using mainly CMUCL. I don't know if it is still the case but one of the differences between CMUCL at that time was that CMUCL had an implementation of green threads while SBCL used OS threads.
Now I switched to SBCL. A friend of mine is addicted to erlang/elixir and advertise the agility of these languages regarding the creation of (green) threads.
Are there folks here using CMUCL ?
How do CMUCL green thread compare to erlang's ?
How do they compare to GT implementation for SBCL like https://github.com/thezerobit/green-threads ? (both in terms of functionality and perf).

T.
Madhu
2020-04-07 12:14:13 UTC
Permalink
Post by Thibault Langlois
Hello, once upon a time I was using mainly CMUCL. I don't know if it
is still the case but one of the differences between CMUCL at that
time was that CMUCL had an implementation of green threads while SBCL
used OS threads. Now I switched to SBCL. A friend of mine is addicted
to erlang/elixir and advertise the agility of these languages
regarding the creation of (green) threads. Are there folks here using
CMUCL ?
I stopped using CMUCL around when the developer started working for
google. And I wasn't able to use my existing version to bootstrap a
glibc-based change in which another develeoper (also google) put in.

I would ocassionally run into problems with CMUCl's threads when running
CL-HTTP and never got to the bottom of those. The CLIM-SYS interface to
processes had a lot of advantages when debugging multiprocess programs
(for eg. without-interrupts actually worked - only one thread is running
at any time) - all those niceties are gone with pthreads. I think
green-threads vs. pthreads is a 90s issue now. There may have been a
certain advantage in using the "thread" model without having to handle
select yourself but I don't think it is relevant in this day. I don't
think the thread implemetation been tested "industrially" - industrial
uses of CMUCL seem to use the fork model to handle multiprocesses.

PS. ping pcostanza (in case you have an answer to my other post on mop -
i'd appreciate a note - thanks)
Paul Rubin
2020-04-07 21:55:24 UTC
Permalink
I think green-threads vs. pthreads is a 90s issue now. There may have
been a certain advantage in using the "thread" model without having to
handle select yourself but I don't think it is relevant in this day.
What do you mean by this? Am I behind the times? I'm used to the idea
that OS threads are quite heavyweight and switching between them is
comparatively slow. So Erlang and GHC both use green threads under the
hood, though they abstract that away from the user. See:

https://dl.acm.org/doi/10.1145/2578854.2503790

for how GHC supports a level of concurrency (millions of simultaneous
network connections using green threads) that I think isn't feasible
with pthreads. In fact the Linux kernel recently got a new feature
(file open operations in io_uring) to help support this, I believe.
I don't think the thread implemetation been tested "industrially" -
industrial uses of CMUCL seem to use the fork model to handle
multiprocesses.
Yeah threads and shared data are a pain. Erlang simulates multiple
processes (no shared data) and GHC relies somewhat on immutability to
avoid some of the hazards, plus it has an implementation of
transactional memory (STM).
Madhu
2020-04-08 04:32:35 UTC
Permalink
Post by Paul Rubin
I think green-threads vs. pthreads is a 90s issue now. There may have
been a certain advantage in using the "thread" model without having to
handle select yourself but I don't think it is relevant in this day.
[Far be it for from me to decide what is or is not relevant]
Post by Paul Rubin
What do you mean by this? Am I behind the times? I'm used to the idea
that OS threads are quite heavyweight and switching between them is
comparatively slow.
I frequently come across material that spin the other side: that cost of
switching is not very different between native threads and OS processes
("extremely cheap", "implemented identically" in the linux kernel)

[Also there is a terminology trap. LWP (Light weight Processes) in linux
literature are actually the native kernel threads which happen to share
some address space ("Process"). I think erlang and GHC literature tend
to use LWP differently]

There is a an overhead for stack group switching in green threads as
well - which is not there if you are call select(or now epoll) on
millions of file descriptors yourself and process the results in a
single main loop running in a single process. (I think the ircds of the
90s nailed this scalability down early on.) CMUCL multiprocessing
(green threads) is of course based on x86 stack groups, contributed by
DTC in 1997

I understood the big narrative since the mid 90s onwards was to sell
multicores - which meant the cores were to be kept busy at any cost -
which skewed the requirements quite a bit. In a green thread scenario
multiple "user processes" still needed only one processor. the other
processors would be idle.
Post by Paul Rubin
So Erlang and GHC both use green threads under the
https://dl.acm.org/doi/10.1145/2578854.2503790
for how GHC supports a level of concurrency (millions of simultaneous
network connections using green threads) that I think isn't feasible
with pthreads.
But it would be implemented using pthreads on linux. 40+ pthreads would
br running simultaneously on the 40+ cores and each would individually
handle the millions of "GHC user threads" in their processing loop. The
problems addressed in this paper would be that of schedule the GHC user
threads across the 40+ pthreads. (That's what I got the abstract which
I may have poorly comprehended) - if i still have access to my
university account I'll try to get a copy of the article next week.)
Post by Paul Rubin
In fact the Linux kernel recently got a new feature
(file open operations in io_uring) to help support this, I believe.
I don't think the thread implemetation been tested "industrially" -
industrial uses of CMUCL seem to use the fork model to handle
multiprocesses.
Yeah threads and shared data are a pain. Erlang simulates multiple
processes (no shared data) and GHC relies somewhat on immutability to
avoid some of the hazards, plus it has an implementation of
transactional memory (STM).
[I've seen some CL projects which implement STM based on CAS but haven't
undestood how they could be used beyond teaching cs101 data structures.]
Madhu
2020-04-08 04:50:31 UTC
Permalink
I forgot to mention: as far as the languages that are being invested in
by the kingdom these days: (javascript etc) the programmer is not given
the opportunity to deal with threads at all. They all seem to use the
glib-2.0 model. Network/Async IO is handled uniformly by in
continuation style in the APIs - pass a function which will be called
when the operation eventually completes. The threading decisions are
handled by the language runtime and are opaque to the user. With the
numbers behind it this would tend to skew the PL requirements landscape
Paul Rubin
2020-04-08 08:34:21 UTC
Permalink
Post by Madhu
I'm used to the idea that OS threads are quite heavyweight
I frequently come across material that spin the other side: that cost of
switching is not very different between native threads and OS processes
("extremely cheap", "implemented identically" in the linux kernel)
Yes, in Linux, threads and processes are basically the same thing, and
both are fairly expensive compared to green threads, or what Windows
would call fibers.
Post by Madhu
There is a an overhead for stack group switching in green threads as
well
I was unfamiliar with the idea of stack groups but found this with web
search:

https://sourceforge.net/p/sbcl/mailman/message/5492153/

It complains that CMUCL (at least in 2001) copied the stacks around way
too much, instead of flipping a few pointers. I haven't studied the low
level details of how GHC or Erlang do it, but traditional Forth
multitaskers only had to move a few words of info to do a task switch.
Post by Madhu
In a green thread scenario multiple "user processes" still needed only
one processor. the other processors would be idle.
GHC and Erlang both use multicores. The GHC I/O manager has a separate
event loop on each OS thread, which normally corresponds to a hardware
thread. I'm not sure how Erlang's works, but I should find out.
Post by Madhu
https://dl.acm.org/doi/10.1145/2578854.2503790
The problems addressed in this paper would be that of schedule the GHC
user threads across the 40+ pthreads.
Yes.
Post by Madhu
if i still have access to my university account I'll try to get a copy
of the article next week.)
It is open access and you should be able to download the pdf without a
university account. Try:

https://dl.acm.org/doi/pdf/10.1145/2578854.2503790?download=true

If that doesn't work, I can find you another url, or put it on my own
server or email it to you or whatever. Search engine should also find
copies around the web.
Post by Madhu
[I've seen some CL projects which implement STM based on CAS but haven't
undestood how they could be used beyond teaching cs101 data structures.]
STM in GHC is very nice. Here's sort of a user guide:

http://book.realworldhaskell.org/read/software-transactional-memory.html
Kaz Kylheku
2020-04-08 16:03:15 UTC
Permalink
Post by Paul Rubin
Post by Madhu
I'm used to the idea that OS threads are quite heavyweight
I frequently come across material that spin the other side: that cost of
switching is not very different between native threads and OS processes
("extremely cheap", "implemented identically" in the linux kernel)
Yes, in Linux, threads and processes are basically the same thing, and
both are fairly expensive compared to green threads, or what Windows
would call fibers.
The most expensive thing about threads is probably the stack use. By
default, threads on Glibc/Linux chew up 2 megabytes of VM. If you spawn
a thousand threads without trimming the stack size, you need 2 gigs of
VM. THe conservative stack size is needed because threads can call
anything; any API, any library function. You never know how deep the
recursion will go, or what size stack frames may be involved.

Some 14 years ago, I was working in an organization dealing with heavily
threaded, large, embedded application that was sporting a 5G footprint
(64 bit MIPS). I put in an experimental patch to trim thread stacks to
64 KB. It mostly worked, except for a mysterious segfault: a thread
jumped the 4096 byte stack guard page and ended up in another thread's
stack. The culprit was third-party code: a routing stack whose
debugging/tracing macros contained hidden { char buf[8192]; ... }
declarations for formatting messages. When just a few of these functions
nested, they easily blew past the 64K stack, and thanks to the size of
the buffer, it was able to jump over a guard page in one stride.

But green threads cannot magically escape the stack use problem.
Post by Paul Rubin
Post by Madhu
There is a an overhead for stack group switching in green threads as
well
I was unfamiliar with the idea of stack groups but found this with web
https://sourceforge.net/p/sbcl/mailman/message/5492153/
It complains that CMUCL (at least in 2001) copied the stacks around way
too much, instead of flipping a few pointers. I haven't studied the low
level details of how GHC or Erlang do it, but traditional Forth
multitaskers only had to move a few words of info to do a task switch.
I have implemented delimited continuations by copying stacks.
Because of the delimitation, you don't ahave to copy an entire stack, just
a small portion, whose size is indirectly controlled by how
the application sets up the delimitation.

Implementing *threading* by stack copying is a bit of a fail. However,
one good attribute is that if we switch among threads by copying their
stack into the stack area of the main thread of the process, then
library functions which are invoked from our threads do not get
surprised by the stack address.

Under normal threading, thread stacks are supposed to be independent.
It is possible for one thread to allocate a variale on its stack,
and then pass a pointer to that into another thread. For instance,
under send/receive/reply IPC. A thread declares a message variable
on the stack, and fills it in. It then sticks the address of this into
a message queue and waits. Another thread picks up the message from the
queue, processes it, sticks a reply into the message and wakes up the
original thread.

If threads are dispatched by stack copying, that's going to be
difficult. Thread B cannot poke around in thread A's message, because
the stack memory at that address now belongs to some other thread,
perhaps B itself.

Basically, the stack of suspended thread has to stay where it is.

This problem could be avoided if threads never refer to objects on each
other's stacks. If thread A's message is a heap allocated object,
to which thread A just has a stacked reference, that isn't a problem.
That's likely why stack copying could work in a Lisp, if reasonable
care is taken with external libraries.
Post by Paul Rubin
Post by Madhu
In a green thread scenario multiple "user processes" still needed only
one processor. the other processors would be idle.
GHC and Erlang both use multicores. The GHC I/O manager has a separate
event loop on each OS thread, which normally corresponds to a hardware
thread. I'm not sure how Erlang's works, but I should find out.
It's all old hat. As an undergrad, I hacked on a green threading library
that used per-CPU forked processes as the basis for multiprocessing.

The pthreads implementations from some Unix vendors have historically
taken the same approach. I think in Solaris, M threads are mapped to
N LWP's (light-weight processes), N being chosen based on some
heuristic related to the number of CPU's available or something like
that.

Kids are bored today so they are resurrecting old hacks.

I mean, why would a new generation be denied the fun of hacking on yet
another C threading library, just because it was done before and laid to
rest by their predecessors.
Robert L.
2020-04-08 16:57:09 UTC
Permalink
Post by Kaz Kylheku
a bit of a fail.
Correction:

a bit of a failure.

I recommend getting a sixth-grade education.
--
The report card by the American Society of Civil Engineers showed the national
infrastructure a single grade above failure, a step from declining to the point
where everyday things simply stop working the way people expect them to.
http://archive.org/details/nolies
Madhu
2020-04-09 07:28:37 UTC
Permalink
Post by Paul Rubin
It is open access and you should be able to download the pdf without a
https://dl.acm.org/doi/pdf/10.1145/2578854.2503790?download=true
Thanks - I think I was just missing cookies on the acm site in my
browser. it worked with wget.
Post by Paul Rubin
If that doesn't work, I can find you another url, or put it on my own
server or email it to you or whatever. Search engine should also find
copies around the web.
Kaz Kylheku
2020-04-08 15:45:03 UTC
Permalink
Post by Paul Rubin
I think green-threads vs. pthreads is a 90s issue now. There may have
been a certain advantage in using the "thread" model without having to
handle select yourself but I don't think it is relevant in this day.
What do you mean by this? Am I behind the times? I'm used to the idea
that OS threads are quite heavyweight and switching between them is
comparatively slow. So Erlang and GHC both use green threads under the
There seems to be a kind of revival of green context juggling hacks.

For instance, C++20 has coroutines.

I'm strongly opposed to this nonsense myself.

As far as I'm concerned, delimited continuations are the ultimate
refinement of green context juggling.

Coroutines and such should be regarded as legacy hacks that were
relevant in old operating systems, whose application-visible
virtual machine models didn't expose sufficient multiprocessing
and asynchrony support.
none) (albert
2020-07-31 12:40:50 UTC
Permalink
Post by Kaz Kylheku
Post by Paul Rubin
I think green-threads vs. pthreads is a 90s issue now. There may have
been a certain advantage in using the "thread" model without having to
handle select yourself but I don't think it is relevant in this day.
What do you mean by this? Am I behind the times? I'm used to the idea
that OS threads are quite heavyweight and switching between them is
comparatively slow. So Erlang and GHC both use green threads under the
There seems to be a kind of revival of green context juggling hacks.
For instance, C++20 has coroutines.
I'm strongly opposed to this nonsense myself.
As far as I'm concerned, delimited continuations are the ultimate
refinement of green context juggling.
Coroutines and such should be regarded as legacy hacks that were
relevant in old operating systems, whose application-visible
virtual machine models didn't expose sufficient multiprocessing
and asynchrony support.
In my Forth I've CO which is a coroutine call.
It comes in handy if I want to decorate BAD-FUNCTION with a
function that prints the stack before and after.
{ } is lina's idea of a lambda.
'BAD-FUNCTION { .S CO .S } decorated

The decorator passes control to BAD-FUNCTION then gets it back.

(As you have guessed, during debugging I now see what the bad
function works with and what it returns. With some luck
there is a smoking gun: an exception just before the return
values are printed. )

So an all too categoric dismissal of coroutines doesn't
resound with me.

More at
https://github.com/albertvanderhorst/lina

Groetjes Albert
--
This is the first day of the end of your life.
It may not kill you, but it does make your weaker.
If you can't beat them, too bad.
***@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst
a***@math.uni.wroc.pl
2020-04-09 19:02:41 UTC
Permalink
Post by Paul Rubin
I think green-threads vs. pthreads is a 90s issue now. There may have
been a certain advantage in using the "thread" model without having to
handle select yourself but I don't think it is relevant in this day.
What do you mean by this? Am I behind the times? I'm used to the idea
that OS threads are quite heavyweight and switching between them is
comparatively slow.
When NPTL come out its developers bencharked several alternatives.
Results: NPTL was one of the fastest (or maybe the fastest).
If you think about it, any "pure" implementatiom with per thread
stack is doing essentially the same job. Kernel threads has
additional overhead of crossing protection boundary and
dispatching to actual threading function, which on modern
Linux is reasonably low. User level threads save on this,
but have additional cost of catching various events which
should cause switch (large fraction of those events go
trough kernel so are available in kernel at no extra
cost). Mixed implementation, with several user threads
running insise single kernel thread were slowest ones.

Anyway, IIRC switching times were on order of single
microsecond. If you want to treat thread as programming
language construct, then this is very heavy. If you
consider that switching threads involves switching
working set (registers, return stack, various context
items) NPL switching time was surprisingly low.

Now, there were situations in which Linux kernel threads
performed poorly. So Erlang nad GHC folks may have
some point (I have no time to look for details now).
But I also met a lot of bogus benchmarks, so I
it would take some solid evidence (and enough
time on my side to verify said evidence) to convice
me that Erlang or GHC offer better perfomance than
Linux kernel.
--
Waldek Hebisch
Po Lu
2020-04-18 08:35:50 UTC
Permalink
Post by Thibault Langlois
Now I switched to SBCL.
A friend of mine is addicted to erlang/elixir and advertise
the agility of these languages regarding the creation of (green) threads.
Are there folks here using CMUCL ?
CMUCL is no longer relevant at this point; Everyone who wants a decent
free-as-in-freedom-and-price CL implementation now uses SBCL.
Post by Thibault Langlois
How do CMUCL green thread compare to erlang's ?
Erlang doesn't actually use "green threads", if you go by a strict
definition. Erlang *processes* "share" data by copying them between the
various different processes, while CMUCL threads share the same heap
space.
Post by Thibault Langlois
How do they compare to GT implementation for SBCL like
https://github.com/thezerobit/green-threads ?
(both in terms of functionality and perf).
Performance? In my experience the CMU implementation was rather ugly and
slow. I haven't tried out this implementation, but it could potentially
be better.
Paul Rubin
2020-04-18 08:40:56 UTC
Permalink
Post by Po Lu
Erlang doesn't actually use "green threads", if you go by a strict
definition. Erlang *processes* "share" data by copying them between the
various different processes, while CMUCL threads share the same heap
space.
This is mostly a matter of the language semantics. Some objects like
large strings are in fact shared between Erlang processes, but they are
immutable, so the user program can't tell that they aren't actually
being copied when you pass them from one process to another.
Po Lu
2020-04-18 09:00:02 UTC
Permalink
Post by Paul Rubin
Post by Po Lu
Erlang doesn't actually use "green threads", if you go by a strict
definition. Erlang *processes* "share" data by copying them between the
various different processes, while CMUCL threads share the same heap
space.
This is mostly a matter of the language semantics. Some objects like
large strings are in fact shared between Erlang processes, but they are
immutable, so the user program can't tell that they aren't actually
being copied when you pass them from one process to another.
I'm not too familiar with Erlang implementation details, but yeah,
you're right. The point I was trying to make however, was that CMU green
threads (or green threads in general on most languages) with Erlang
processes isn't entirely accurate, as you can effectively treat Erlang
processes as being completely isolated from eachother.

Plus, with Erlang, you get a lot of goodies (ie Concurrent ML-style message
passing, etc.) that make processes very nice to work with, while with
plain CMU green threads you get nothing.

IIRC, the Erlang VM also spawns a native thread for each CPU core. Am I
correct?
Loading...