AI’s “streaming text” UIs: a how-to

November 27th, 2024. Tagged: AI, JavaScript, php


You've seen some of these UIs as of recent AI tools that stream text, right? Like this:

I peeked under the hood of ChatGPT and meta.ai to figure how they work.

Server-sent events

Server-sent events (SSE) seem like the right tool for the job. A server-side script flushes out content whenever it's ready. The browser listens to the content as it's coming down the wire with the help of EventSource() and updates the UI.

(aside:) PHP on the server

Sadly I couldn't make the PHP code work server-side on this here blog, even though I consulted Dreamhost's support. I never got the "chunked" response to flush progressively from the server, I always get the whole response once it's ready. It's not impossible though, it worked for me with a local PHP server (like $ php -S localhost:8000) and I'm pretty sure it used to work on Dreamhost before they switched to FastCGI.

If you want to make flush()-ing work in PHP, here are some pointers to try in .htaccess

<filesmatch "\.php$">
    SetEnv no-gzip 1
    Header always set Cache-Control "no-cache, no-store, must-revalidate"
    SetEnv chunked yes
    SetEnv FcgidOutputBufferSize 0
    SetEnv OutputBufferSize 0
<filesmatch>

And a test page to tell the time every second:

<?php
header('Cache-Control: no-cache');

@ob_end_clean();

$go = 5;
while ($go) {
    $go--;
    // Send a message
    echo sprintf(
      "It's %s o'clock on my server.\n\n", 
      date('H:i:s', time()),
    );
    flush();
    sleep(1);
}

In this repo stoyan/vexedbyalazyox you can find two PHP scripts that worked for me.

BTW, the server-side partial responses and flushing is pretty old as web performance techniques go.

A bit about the server-sent messages

(I'll keep using PHP to illustrate for just a bit more and then switch to Node.js)

In their simplest from server-sent events (or messages) are pretty sparse, all you do is:

echo "data: I am a message\n\n";
flush();

And now the client can receive "I am a message".

The events can have event names, anything you make up, like:

echo "event: start\n";
echo "data: Hi!\n\n";
flush();

More on the message fields is available on MDN. But all in all, the stuff you spit out on the server can be really simple:

event: start
data:

data: hello

data: foo

event: end
data:

Events can be named anything, "start" and "end" are just examples. And they are optional too.

data: is not optional. Even if all you need is to send an event with no data.

When event: is omitted, it's assumed to be event: message.

The client's JavaScript

To get started you need an EventSource object pointed to the server-side script:

const evtSource = new EventSource(
  'https://pebble-capricious-bearberry.glitch.me/',
);

Then you just listen to events (messages) and update the UI:

evtSource.onmessage = (e) => {
  msg.textContent += e.data;
};

And that's all! You have optional event handlers should you need them:

evtSource.onopen = () => {};
evtSource.onerror = () => {};

Additionally, you can listen to any events with names you decide. For example I want the server to signal to the client that the response is over. So I have the server send this message:

event: imouttahere
data:

And then the client can listen to the imouttahere event:

evtSource.addEventListener('imouttahere', () => {
  console.info('Server calls it done');
  evtSource.close();
});

Demo time

OK, demo time! The server side script takes a paragraph of text and spits out every word after a random delay:

$txt = "The zebra jumps quickly over a fence, vexed by...";
$words = explode(" ", $txt);
foreach ($words as $word) {
    echo "data: $word \n\n";
    usleep(rand(90000, 200000)); // Random delay
    flush();
}

The client side sets up EventSource and, on every message, updates the text on the page. When the server is done (event: imouttahere), the client closes the connection.

Try it here in action. View source for the complete code. Note: if nothing happens initially, that's because the server-side Glitch is gone to sleep and needs to wake up.

One cool Chrome devtools feature is the list of events under an EventStream tab in the Network panel:

Now, what happens if the server is done and doesn't send a special message (such as imouttahere)? Well, the browser thinks something went wrong and re-requests the same URL and the whole thing repeats. This is probably desired behavior in many cases, but here I don't want it.

Try the case of a non-terminating client.

The re-request will look like the following... note the error and the repeat request:
re-requesting

Alrighty, that just about clarifies SSE (Server-Sent Events) and provides a small demo to get you started.

In fact, this is the type of "streaming" ChatGPT uses when giving answers, take a look:

ChatGPT's EventSource

In the EventStream tab you can see the messages passing through. The server sends stuff like:

event: delta
data: {json: here}

This should look familiar now, except the chosen event name is "delta" (not the default, optional "message") and the data is JSON-encoded.

And at the end, the server switches back to "message" and the data is "[DONE]" as a way to signal to the client that the answer is complete and the UI can be updated appropriately, e.g. make the STOP button back to SEND (arrow pointing up)

OK, cool story ChatGPT, let's take a gander at what the competition is doing over at meta.ai

XMLHttpRequest

Asking meta.ai a question I don't see EventStream tab, so must be something else. Looking at the Performance panel for UI updates I see:

meta.ai updates overview

All of these pinkish, purplish vertical almost-lines are updates. Zooming in on one:

meta.ai updates zoomed

Here we can see XHR readyState change. Aha! Our old friend XMLHttpRequest, the source of all things Ajax!

Looks like with similar server-side flushes meta.ai is streaming the answer. On every readyState change, the client can inspect the current state of the response and grab data from it.

Here's our version of the XHR boilerplate:

const xhr = new XMLHttpRequest();
xhr.open(
  'GET',
  'https://pebble-capricious-bearberry.glitch.me/xhr',
  true,
);
xhr.send(null);

Now the only thing left is to listen to onprogress:

xhr.onprogress = () => {
  console.log('LOADING', xhr.readyState);
  msg.textContent = xhr.responseText;
};

Like before, for a test page, the server just flushes the next chunk of text after a random delay:

$txt = "The zebra jumps quickly over a fence, vexed ...";
$words = explode(" ", $txt);
foreach ($words as $word) {
    echo "$word ";
    usleep(rand(20000, 200000)); // Random delay
    flush();
}

XHR client demo page

Differences between XHR and SSE

First, HTTP header:

# XHR
Content-Type: text/plain
# SSE
Content-Type: text/event-stream

Second, message format. SSE requires a (however simple) format of "event:" and "data:" where data can be JSON-encoded or however you wish. Maybe even XML if you're feeling cheeky. XHR responses are completely free for all, no formatting imposed, and even XML is not required despite the unfortunate name.

And lastly, and most importantly IMO, is that SSE can be interrupted by the client. In my examples I have a "close" button:

document.querySelector('#close').onclick = function () {
  console.log('Connection closed');
  evtSource.close();
};

Here close() tells the server that's enough and the server takes a breath. No such thing is possible in XHR. And you can see inspecting meta.ai that even though the user can click "stop generating", the response is sent by the server until it completes.

Node.js on the server

Finally, here's my Node.js that I used for the demos. Since I couldn't get Dreamhost to flush() in PHP, I went to Glitch as a free Node hosting to host just this one script.

The code handles requests / for SSE and /xhr for XHR. And there are a few ifs based on XHR vs SSE:

const http = require("http");

const server = http.createServer((req, res) => {
  if (req.url === "/" || req.url === "/xhr") {
    const xhr = req.url === "/xhr";
    res.writeHead(200, {
      "Content-Type": xhr ? "text/plain" : "text/event-stream",
      "Cache-Control": "no-cache",
      "Access-Control-Allow-Origin": "*",
    });

    if (xhr) {
      res.write(" ".repeat(1024)); // for Chrome
    }
    res.write("\n\n");

    const txt = "The zebra jumps quickly over a fence, vexed ...";
    const words = txt.split(" ");
    let to = 0;
    for (let word of words) {
      to += Math.floor(Math.random() * 200) + 80;
      setTimeout(() => {
        if (!xhr) {
          res.write(`data: ${word} \n\n`);
        } else {
          res.write(`${word} `);
        }
      }, to);
    }

    if (!xhr) {
      setTimeout(() => {
        res.write("event: imouttahere\n");
        res.write("data:\n\n");
        res.end();
      }, to + 1000);
    }

    req.on("close", () => {
      res.end();
    });
  } else {
    res.writeHead(404);
    res.end("Not Found\n");
  }
});

const port = 8080;
server.listen(port, () => {
  console.log(`Server started on port ${port}`);
});

Note the weird-looking line:

res.write(" ".repeat(1024)); // for Chrome

In the world of flushing, there are many foes that want to buffer the output. Apache, PHP, mod_gzip, you name it. Even the browser. Sometimes it's required to flush out some emptiness (in this case 1K of spaces). I was actually pleasantly surprised that not too much of it was needed. In my testing this 1K buffer was needed only in the XHR case and only in Chrome.

That's all folks!

If you want to inspect the endpoints here they are:

Once again, the repo stoyan/vexedbyalazyox has all the code from this blog and some more too.

And the demos one more time:

Small update: honorable mention for Web Sockets

Web Sockets are yet another alternative to streaming content. Probably the most complex of the three in terms of implementation. Perplexity.ai and MS Copilot seem to have went this route:

Perplexity

Copilot

Comments? Find me on BlueSky, Mastodon, LinkedIn, Threads, Twitter