Skip to Content Skip to Navigation

Profile image for Simon Willison Simon Willison

Open source developer building tools to help journalists, archivists, librarians and others analyze, explore and publish their data. and many other .

This is a member of another server. See their original profile ➔

@synx508 one of the areas I'd most like to see more research into is how non-experts' mental models of what these things can do evolve over time

What effect does it have the first time it clearly lies to them or makes an obvious and egregious mistake?

You do have to maintain a very critical eye at all times - it's so easy to develop an incorrect mental model, like thinking ChatGPT can retrieve and read a URL because it generates a very convincing looking (though actually purely hallucinated) summary when you paste one in

One of the big challenges in learning to use LLMs (Large Language Models like ChatGPT) is overcoming the very human urge to generalize and quickly form assumptions

"I tried X and the result was terrible - hence LLMs are terrible at X"

That might be true! They're bad at lots of things. But making a snap judgement based on a single anecdotal example is not a good way to explore this very weird new space

Edited 5m ago

@tafadzwa @hasanahmsd @jeffjarvis @leo Bard is pretty embarrassingly far behind Bing and ChatGPT/GPT4 right now, so I imagine they are rushing this out as fast as they can to regain an edge

On the one hand this is just an amusing thing to get a language model to do

But it's actually pretty useful too! It turns out I'm much more likely to remember concepts if they've been illustrated by analogies involving otters running a kayak shop (or armadillos running a cactus nursery)

You don't even have to remember the names of the books - vague description works just as well

"Now have someone read that book that got really popular about how wishing for things becomes real - I think it was called the Power or the Gift or something like that" (GPT figured out it was the Secret)

Edited 1h ago

Where it gets fun is when you start expanding their reading list with follow-up prompts:

"Now, one of the otters get hold of a copy of the communist manifesto and takes that to heart. What happens next at the sea kayak company?"

Edited 1h ago

My current favourite GPT prompt:

"A group of otters work together running a sea kayak rental shop. One day the manager otter reads the book "High Output Management". Write a story about how he applies what he learns at work"

Edited 1h ago

@electricarchaeo I am very interested in learning about your museum catalogue project!

@SnoopJ yeah thankfully the debug tools are good and make it very easy to spot the problem (if not so obvious to understand the cause and the solution)

@MattHodges a few of them are protected - eg the instacart one

But... you can fish them out of the browser developer tools too!

I have a good theory on the hallucination now: I think it happens when the previous SQL query returns JSON that's too long for the token limit - ChatGPT appears to silently truncate it and then hallucinate data to make up for the gap!

More notes on that in the issue thread:

@MattHodges I think I've figured out what's going wrong: I think it's about content length. If the SQL query returns data that exceeds the token limit, ChatGPT ignores most of the data and makes everything up

@kherge It can totally do that too - I've even managed to get it to look up data in Datasette and then send it to Wolfam Alpha in order to plot it

@ppcqua I registered, but then I think someone at OpenAI who knows me bumped me up the list

@ryansingel Mainly that it's EASY - for you as a developer, and for your users. Being able to add functionality like this into the ChatGPT interface that you are already using is a huge win

... and here's the bad news. It can hallucinate, inventing data that's entirely independent of the data that came back from the SQL query! I have an open issue about that here, including some examples:

I built a ChatGPT plugin to answer questions about data hosted in Datasette

I get their desire to keep costs down and onboard more users faster... but it's VERY clear to anyone who's comparing these models that Bard is significantly behind ChatGPT 3.5, ChatGPT 4 and Bing

Here's a solid comparison of the big five current public models that echoes my own experiences pretty well:

I've been puzzled at why Google Bard seems to be much less capable than Bing - for example it seems much more prone to hallucination when it should be using facts it pulls in from Google search

Turns out it's built on Google's cheaper, less powerful model!

> We’re releasing it initially with our lightweight model version of LaMDA. This much smaller model requires significantly less computing power, enabling us to scale to more users, allowing for more feedback

Weeknotes: AI won’t slow down, a new newsletter and a huge Datasette refactor

I just landed the single biggest backwards-incompatible change to Datasette ever, in preparation for the 1.0 release

It's a change to the default JSON format from the Datasette API - the new format is much slimmer, as seen on

You can request extra keys with ?_extra= - e.g.

I'm desperately keen on getting feedback on this change! I started an issue for that here:

@lawnerdbarak Sure, it's fantastic at writing text for that kind of prompt - and it's really good at translating text into other languages too

That's a different use-case from trying to use it to summarize an article based on a URL, which is something it cannot do because it doesn't have the ability to fetch from URLs

@jamies That one doesn't look like a hallucination to me - I tried it myself and it picked up details like April 10th which weren't in the URL at all. I think it's looking at the latest cached version of the page from the Google search crawler, like Bing does