Agentic trip matcher

Introduction

At TourHero, matching a traveler to the right itinerary is harder than it looks. Users come with nuanced preferences — a family that wants adventure but not too strenuous, or a couple looking for romance without the tourist crowds. Traditional keyword search falls short here because it cannot interpret intent or weigh trade-offs. We needed a smarter approach.

Building on Maya, our AI-powered agentic chatbot (covered in my previous post), we extended its capability to collect traveler preferences through conversation and use them to match itineraries intelligently.

Matching Approach

Before extending the chat feature, we had already built a foundation: using embeddings combined with an LLM to score itineraries across multiple relevance dimensions tailored to our trip catalog.

Embeddings alone were not sufficient. They capture semantic similarity from text, but they lack broader world knowledge. A trip described as “cultural” might be a great fit for a foodie or a history enthusiast — distinctions that require reasoning beyond text similarity. That is where GPT-based scoring added real value: given an itinerary, it could infer richer attributes and score them against our defined dimensions with far greater accuracy.

Implementation

Since we were extending Maya’s chat feature, and because information collection naturally breaks into discrete stages (destination, duration, group type, interests, budget), we structured the flow around those stages. This made the conversation logic much clearer than the open-ended trip composer.

We implemented a custom memory layer to track which stages had already been addressed. A main AI agent then decided which sub-agent should handle the next turn — whether to ask a follow-up question, answer a general TourHero query, or move on to matching.

We also added a retrieval-augmented knowledge layer so Maya could answer general questions about TourHero — not just preference-collection questions — which made the conversation feel more natural and less like filling in a form.

One significant challenge was latency. Every turn depended on the OpenAI API, which added noticeable delay. I investigated low-level improvements, including Faraday with persistent HTTP connections, but the gains were marginal. The team then found more impactful solutions: capping the maximum output tokens to reduce generation time, and offloading non-blocking inference work to background jobs. Together these cut response times by 30-50%.

For the matching algorithm itself, we implemented Reciprocal Rank Fusion (RRF), combining vector similarity with full-text search. This hybrid approach significantly improved retrieval quality compared to either signal alone.

My Contributions

My main contribution was the post-match recommendation layer.

After an itinerary is matched to a user’s request, it rarely satisfies every preference perfectly. The user might want 5-star hotels but the matched itinerary uses 4-star; they might want hiking but the itinerary leans cultural. Without guidance, users are left to figure out the gap themselves.

I built a system that uses an LLM to compare the matched itinerary against the user’s stated preferences and generate structured, friendly recommendations — for example, “Ask Maya to upgrade to 5-star hotels in Bali” or “Ask Maya about adventurous hiking options in Peru.” Each recommendation also carries the underlying reason (which preference was unmatched), so the output is not just a list of suggestions but a transparent explanation users can act on.

The core challenge was speed. Adding a real-time LLM call after every match would have blocked the user experience. My solution was to pre-score all itineraries in a scheduled background job, so the dimension-level scoring was already available at match time. The post-match step then only needed a lightweight prompt to compare scores and generate user-facing messages, keeping the interaction responsive.

Outcome

The combination of improved matching, a more natural conversational flow, and transparent post-match recommendations moved user satisfaction to above 90%. The latency optimizations — capped output tokens and background job offloading — delivered a 30-50% reduction in response time, which meaningfully reduced drop-off during the conversation.