Description
Create your Custom Business AI Agent that speaks, sees, listens and replies to your customers.
🚀 What this workflow does
- Receives any inbound WhatsApp message via a Wassenger Trigger
- Detects the medium – text, voice note, image or document (PDF)
- Processes accordingly
- Text → straight to the AI brain
- Voice notes → download ➜ Whisper transcription
- Images → download ➜ GPT-4o Vision analysis
- PDFs only → download ➜ text extraction
- Feeds the cleaned input + short-term memory buffer (20 turns) to an OpenAI Chat Agent (GPT-4o-mini by default)
- Sends the answer back through Wassenger:
- If the user sent audio, the bot replies in audio (OpenAI TTS ➜ saves mp3 to Google Drive ➜ returns the public link).
- Otherwise, returns plain text.
- Gracefully rejects anything that isn’t text, image, audio or a PDF (“Sorry, you can only send …”)
Result: a polite, context-aware concierge that can read your contract, describe your cat photo, or summarize a 3-minute rant into a single line—without ever leaving WhatsApp.
🧩 Key components
Node |
Purpose |
Wassenger Trigger / Wassenger |
Receive & send WhatsApp messages |
Switch → “Input type” |
Routes to Text / Audio / Image / Document branches |
HTTP Request |
Securely downloads media from Wassenger |
OpenAI Whisper |
Turns voice notes into text |
GPT-4o Vision |
Describes images in detail |
Extract From File |
Converts PDFs to text |
LangChain Agent |
Central brain with custom system prompt |
Memory Buffer Window |
Keeps the last 20 turns per chat |
OpenAI TTS (“Generate Audio Response”) |
Converts answers to speech (voice “nova”) |
Google Drive (Upload + Delete) |
Stores the mp3, grabs a share link, cleans up |
(Sticky notes in the canvas label the four media lanes so future-you won’t get lost.)
🛠️ Prerequisites
- Wassenger device + API key
- OpenAI API key (chat, whisper, TTS, vision)
- Google Drive OAuth credentials (for audio replies)
💡 Ideas & extensions
- Pipe extracted conversation data into HubSpot or Airtable.
- Replace GPT-4o with your on-prem model ➜ just swap the Chat node.
- Add a Sentiment node to auto-escalate angry customers.
- Expand document branch to Word, PowerPoint or spreadsheets.
⚖️ Limits & best-practice nudges
- Only PDFs are accepted for now; other file types trigger a polite rejection.
- The workflow rate-limits itself by design (single execution per message), but you may want extra guards if you point it at a large audience.
- Delete Google Drive files after sending (already included) to keep storage costs clean.
- Remember WhatsApp’s 24-hour customer-initiated window.
🏁 Ready, set, automate!
Import → Hit Active. Your WhatsApp number just became a futuristic, multimodal AI agent. Enjoy the peace and quiet while it handles the chatter. 😉