Make Your Users Happy - Build an NLP Thesaurus
In any organization of size, the same core business concept can be called a dozen different things. The finance team talks about "Revenue." The sales team might say "Bookings." The e-commerce team calls it "Gross Sales." The European division reports "Turnover," and the legacy system still uses the acronym "GMV" (Gross Merchandise Volume). To your dbt model, it's a single, rigorously defined column: net_revenue.
If your NLP layer can't handle this reality, it will fail. A user asking, "What were our bookings last week?" will be met with a frustrating "I don't understand" unless you've explicitly taught the system that "bookings" means net_revenue.
This is where building a dynamic, maintainable business thesaurus becomes the most critical component of your NLP strategy. It acts as the Rosetta Stone between human language and your data model.
How to Build and Manage the Thesaurus
Here’s a practical approach to handling multiple terms for the same field, applicable to both the Integrated BI and Custom Application paths.
1. The Centralized Mapping Table (The "Source of Truth")
The most robust method is to create a simple, maintainable table that maps business terms to their technical counterparts. This can be a YAML file, a Google Sheet, or a table in your database.
Example Structure of a Thesaurus Table:
business_term | technical_field | model_name | context_notes |
revenue | net_revenue | mart_finance__monthly_pnl | Primary, global term |
bookings | mart_finance__monthly_pnl | Used by Sales | department |
sales | net_revenue | mart_finance__monthly_pnl | Used by E-commerce team |
turnover | net_revenue | mart_finance__monthly_pnl | Used by EU division |
gmv | gross_revenue | mart_finance__monthly_pnl | Legacy term, pre-discounts |
mqls | marketing_qualified_leads | mart_marketing__funnel | |
marketing leads | marketing_qualified_leads | mart_marketing__funnel | Informal term |
signups | new_customer_count | mart_sales__customers | |
new users | new_customer_count | mart_sales__customers |
2. Implementing the Thesaurus in Your NLP Layer
For Path 1 (Integrated BI Tools):
Tools like Power BI and Tableau have built-in functionality for this, though it can be manual.
Power BI Q&A: In the "Teach Q&A" settings, you can add multiple synonyms for a single column. You would manually enter "revenue," "bookings," "sales," etc., all for the net_revenue column.
Pro: Easy to set up for a small number of terms.
Con: Doesn't scale well. Managing this across dozens of tables and hundreds of columns becomes unwieldy.
Looker: This is where Looker's LookML shines. You can define a dimension and then explicitly list its aliases.
lookml
dimension: net_revenue {
type: number
sql: ${TABLE}.net_revenue ;;
aliases: ["bookings", "sales", "turnover", "revenue"]
}
For Path 2 (Custom Application):
This is where a custom solution offers superior power and flexibility. You can programmatically integrate the thesaurus.
- Step 1: Ingest the Thesaurus. Load your central mapping table (e.g., from a Google Sheet or database) when your application starts.
- Step 2: Pre-process the User's Question. Before sending the question to the LLM (like GPT-4), scan it for known business terms from your thesaurus and "translate" them.
- Original User Question: "Show me last month's bookings and mqls by region."
- Pre-processed Question for LLM: "Show me last month's net_revenue and marketing_qualified_leads by region."
- Step 3: Enhanced Prompting with the Thesaurus. Alternatively, you can pass the thesaurus as context to the LLM in the system prompt, making the LLM itself responsible for the translation. This is often more elegant.
- python.
system_prompt = f"""
You are a data analyst AI. Your job is to convert natural language questions into SQL queries.
You have access to the following data models in BigQuery. When a user uses a business term, you MUST map it to the correct technical field using the following thesaurus:
{json.dumps(business_thesaurus)}
Example:
User: "What were our bookings last quarter?"
You: [Interpret "bookings" as "net_revenue"]
Now, answer the user's question.
"""
3. Advanced: Context-Aware Disambiguation
Sometimes, the same word can mean different things. "Conversions" for the marketing team is marketing_qualified_leads, but for the product team, it's user_signups. A sophisticated system can handle this.
User Profiling: Tag the thesaurus terms with departments or user groups. When a user from the "marketing" security group asks about "conversions," the system defaults to the MQL definition.
Contextual Clarification: If the system detects ambiguity it can't resolve (e.g., a general user asks for "conversions"), it can be programmed to ask a clarifying question: "Did you mean marketing-qualified leads or product sign-ups?"
Governance: Who Owns the Thesaurus?
This is not just a technical problem; it's a governance one. The process must be:
- Collaborative: Data stewards from Finance, Sales, and Marketing should be able to propose new terms.
- Reviewed: The analytics/BI team must vet and technically map these terms to ensure accuracy.
- Maintained: The thesaurus must be a living document, updated as business jargon evolves.
By investing in this business thesaurus, you do more than just make an NLP tool work; you build a foundational piece of data governance that bridges the gap between the technical implementation of your data stack and the living, breathing language of your business. It ensures that when ten people ask the same question in ten different ways, they all get the same, single source of truth as an answer.Based in Burbank, California, since 2015, Vimware is dedicated to supporting small to midsize businesses and agencies with their behind-the-scenes IT needs. As a Managed Service Provider (MSP), we offer a range of services including cloud solutions, custom programming, mobile app development, marketing dashboards, and strategic IT consulting. Our goal is to ensure your technology infrastructure operates smoothly and efficiently, allowing you to focus on growing your business. Contact us to learn how we can assist in optimizing your IT operations.