What role can generative AI play in search? A few months ago, I set out to learn what was possible using the current generation of AI models. I found some surprises, both good and bad. In this 3-part series, I’ll delve into the lessons I learned along the way.
Part 1: The All-Purpose Shopping Assistant
How do I mix together a traditional structured database and generative AI capabilities? We have a lot of exposure to these large-language models as standalone chat agents, but the truly remarkable steps forward will involve combining the models with other specific data sources in new and interesting ways.
Leading into the holiday shopping season, I started with a domain I know well: retail product databases. These typically contain well structured information: a product name, price, brand, description, color/size, etc. The search problem across product databases has been well-solved and commoditized over the past few decades: faceted filtering capabilities combined with good old-fashioned keyword matching search! Does generative AI bring anything new to the table?
Go to ChatGPT and ask for product recommendations based on a set of criteria, and you generally get some useful results that are store-agnostic. Here’s an example from ChatGPT:
Me: What are good sneakers for someone who is an avid runner and also plays a lot of tennis, and that likes bright, bold colors?
ChatGPT: When looking for sneakers for someone who is an avid runner and plays a lot of tennis, it’s important to consider factors like cushioning, support, durability, and traction. Additionally, if they prefer bright, bold colors, there are many options available. Here are some recommendations that balance performance and style:
Nike Zoom Pegasus: The Nike Pegasus line is known for its comfort and versatility. It offers good cushioning and support for running, and the bold color options make it a fashionable choice.
But what if you run a store, and only have a limited set of products? I made myself an ‘imaginary’ store based on top selling products on Amazon. Can a chatbot help people find good matches inside my database of products, specifically?
One approach that’s gotten a lot of academic and R&D attention is to build or fine-tune your own model, using the database of the products to train the model. This is possible, but also a very intensive solution – it requires a lot of time, effort, and investment. It also doesn’t adapt very well to dynamic data. If a product goes out of stock, re-training the model isn’t a viable option every time the catalog changes.
A better approach to problems like this is to use the database live. The pattern for this is called retrieval-augmented generation (RAG). Rather than having the user chat directly with the AI interface, you add an extra step. Simplifying, it looks something like this:
What the user sees:
USER: Do you have any cigarette lighters that have pictures of Bart Simpson on them?
AI: Yes, we have a zippo lighter with a picture of Bart Simpson on a skateboard.
What really happens:
USER: Do you have any cigarette lighters that have pictures of Bart Simpson on them?
SYSTEM: Database Agent , the User has asked, “Do you have any cigarette lighters that have pictures of Bart Simpson on them?”. Can you retrieve any relevant product records from our database?
DATABASE AGENT: Here are 20 relevant product records: […..]
SYSTEM: AI, the User has asked, “Do you have any cigarette lighters that have pictures of Bart Simpson on them?”. Given the following 20 product records from our system, […..], what is an appropriate response?
AI: Yes, we have a zippo lighter with a picture of Bart Simpson on a skateboard.
It turns out the engineering to make this happen is quite simple: I was able to get a functional chatbot up within a couple of days. The challenging part wasn’t the code writing – it was the prompt writing!
Problem 1: The chat text isn’t enough.
The most common pattern for basic RAG is to look at the semantic concepts in the recent chat, and find database records that touch on similar concepts. This can be done with semantic search or lexical search. In layman’s terms, lexical search looks for matching words, and semantic search looks for matching concepts. However, both have a little catch.
This approach works quite well when the user is asking a fairly directed question. For example:
USER: “Do you have any size 12 sneakers in black?”
Either semantic or lexical search does a fine job surfacing the most relevant items in the database, based on ‘sneakers’, ‘size 12’, ‘black’. However, what about when the user asks something like this?
USER: “What should I get if I’m always on my feet for an 8-hour shift?”
Both the words *and* concepts in this question will *not* line up with any information in the product database! The large language model is capable of building a nuanced response to the second question, but needs a set of retrieved products from the database that are relevant. The likelihood that our database agent was able to cherry-pick the right products for the request? Very low.
Notably, traditional text search also handles the first scenario well, and handles the second scenario poorly. So out of the gate, we’ve got a working chatbot – but one that only does well with the kinds of questions that were *already* well-handled by traditional search! Not so groundbreaking – at least, not yet.
Solution 1: A world of pure imagination.
I found a much better approach was to engage the ‘creative’ side of the LLMs at this stage. Rather than going straight to database retrieval, I injected a new step:
SYSTEM: “Based on what the user has asked, imagine what good matching products might look like.”
AI: “Sneakers with great arch support, with heavy-duty soles for heavy wear.”
By asking the AI to first imagine what some good matching results from the database agent might look like first, I was then able to feed this input to the data retrieval step. While this was slower, the results were sometimes remarkable. I threw some very vague, open-ended requests in, and it came back with creative, appropriate matches. In many cases, it did far better than I could at coming up with ideas for fulfilling very vague requests. Some highlights:
“My apartment is way too bare — how can I liven it up?”
“My kid plays Fortnite a lot – what other games might they like?”
“Can you find me a birthday gift for a teen who is buried in their phone on social media, like, all the time?”
I now had a chat agent that was capable of doing something traditional search retrieval couldn’t handle – extrapolating from a goal to suitable items that might fulfill that goal. OK, this feels….new.
Problem 2: Follow-up questions
When given a chat-style interface, I quickly saw that people did what they’d naturally do with a sales agent in a physical store – ask follow-up questions. Here was where I started to run up against one of the core limitations of the technology, “hallucinations”. While generative LLMs seem quite sophisticated, they are also prone to ‘making s#!& up’. This has been highlighted by a number of high-profile news stories in recent months, where people learned the hard way that cutting and pasting from ChatGPT without fact checking was a recipe for disaster.
RAG approaches can limit this, certainly. Feeding the AI a list of product descriptions and instructing it to answer from those product descriptions leads to fewer hallucinations. But, it doesn’t lead to no hallucinations.
This was especially apparent when users were given freedom to ask any follow up question about a product. Some examples of follow-up questions that continued to make it misbehave:
“Will this water filter work with my model of refrigerator? I have a GE DFG1927492.”
“Is this jacket water-proof?”
“Can this e-reader also act as a remote car starter?”
When the answers were found in the product description, things were fairly safe. When they weren’t – well, let’s just say the agent always tried to make the sale, even when ‘alternate facts’ were needed.
This is a core danger with many applications of this technology. For your product, you have to ask the question, “Is it OK if the bot fabricates something for the user?” There are plenty of times when the answer is “yes”, but when it’s expected to be factually accurate all of the time, free-form chat may pose hidden dangers. What’s OK for a creative assistant may not be OK for a shopping helper, and is definitely not OK for anything providing mission-critical answers.
In the next segment, I’ll explore the next step I took, evolving the chatbot from a general-purpose assistant to a more specific use case. How can I maximize the value from imaginative retrieval while minimizing the problems of hallucination?


Leave a reply to AI and Search: Part III – MeshThink Cancel reply