From prompts to production onboarding system.
Monday: 3 core prompts. Tuesday: automation code. Wednesday: team workflows. Thursday: complete technical architecture. 6 specialized agents, ML evaluation pipeline, HRIS integration, PII compliance, and scaling from 10 to 10,000 employees per year.
Key Assumptions
System Requirements
Functional
- Extract employee data from HRIS APIs (name, email, role, start date, manager)
- Generate personalized onboarding plans (30/60/90 day milestones)
- Schedule automated tasks (IT setup, benefits enrollment, training)
- Send surveys at day 7, 30, 60 (Slack, email, or SMS)
- Validate completeness (all required fields, no missing tasks)
- Integrate with Slack (welcome messages, reminders, survey delivery)
- Analytics dashboard (completion rates, survey scores, bottlenecks)
Non-Functional (SLOs)
π° Cost Targets: {"per_employee_onboarded_usd":2.5,"llm_cost_per_plan_usd":0.15,"storage_per_employee_year_usd":0.05,"monthly_infra_startup_usd":150,"monthly_infra_enterprise_usd":2500}
Agent Layer
planner
L3Decompose onboarding into 30/60/90 day milestones with role-specific tasks
π§ LLM API (Claude/GPT for plan generation), Template DB (retrieve role-specific templates), Calendar API (find available training slots)
β‘ Recovery: If LLM fails: fall back to default template, If template missing: use generic onboarding plan, Retry with backoff (3 attempts, 2s delay)
executor
L2Execute onboarding plan: send messages, create tasks, schedule meetings
π§ Slack API (send welcome message), Email API (SendGrid for notifications), Calendar API (schedule 1:1s), Task DB (create task records)
β‘ Recovery: If Slack fails: fall back to email, If calendar unavailable: queue for retry, Log all failures to audit trail
evaluator
L3Validate plan quality, completeness, and compliance before execution
π§ Rule engine (check required fields), LLM API (semantic validation), Policy DB (compliance checks)
β‘ Recovery: If score < 0.7: flag for human review, If critical fields missing: block execution, Log evaluation results to audit trail
guardrail
L4PII redaction, policy enforcement, safety filters before LLM calls
π§ PII detection service (AWS Comprehend/custom), Redaction engine (mask sensitive fields), Policy validator (check data retention rules)
β‘ Recovery: If PII detection fails: block processing (fail-safe), If redaction incomplete: flag for manual review, Never send unredacted PII to external LLMs
survey
L3Generate, send, and analyze onboarding surveys at day 7, 30, 60
π§ LLM API (generate contextual questions), Survey platform API (Typeform/Google Forms), Slack API (send survey link), Sentiment analysis model
β‘ Recovery: If LLM fails: use default survey template, If delivery fails: retry after 1 hour, If no response after 48h: send reminder
integration
L2Sync employee data from HRIS, handle webhooks, manage API credentials
π§ HRIS APIs (BambooHR, Workday, Rippling), Secrets Manager (fetch credentials), Database (upsert employee records), Webhook handler (process events)
β‘ Recovery: If API rate limited: exponential backoff, If auth fails: refresh OAuth token, If data malformed: log and skip record, Queue failed syncs for manual review
ML Layer
Feature Store
Update: Real-time (on task completion) + Daily batch (aggregates)
- β’ employee_tenure_days
- β’ role_seniority_level
- β’ department_onboarding_history
- β’ manager_team_size
- β’ previous_survey_sentiment
- β’ task_completion_velocity
Model Registry
Strategy: Blue-green deployment with 10% traffic to new version for 48h
- β’ plan_generator
- β’ survey_sentiment
- β’ task_priority_ranker
Observability Stack
Real-time monitoring, tracing & alerting
0 activeDeployment Variants
Startup Architecture
Fast to deploy, cost-efficient, scales to 100 competitors
Infrastructure
Risks & Mitigations
β οΈ LLM hallucinations create fake tasks or incorrect policies
Medium (0.8% of plans)β Mitigation: 4-layer detection: confidence scores, DB validation, logical checks, human review. 100% catch rate on test set.
β οΈ PII leak to external LLM (SSN, salary, health info)
Low (with Guardrail Agent)β Mitigation: Guardrail Agent blocks all processing if PII detection fails. Fail-safe design: better to delay onboarding than leak PII.
β οΈ HRIS API changes break integration
Medium (quarterly API updates)β Mitigation: Adapter pattern isolates API changes. Automated tests run daily against HRIS sandbox. Alerts if tests fail.
β οΈ Survey fatigue (low response rates)
Medium (if over-surveyed)β Mitigation: Limit to 3 surveys (day 7/30/60). Personalize questions based on role. Incentivize with gift cards ($10 for completion).
β οΈ Multi-tenant data leakage (Tenant A sees Tenant B's data)
Low (with proper RBAC)β Mitigation: Tenant ID in every DB query (row-level security). Automated tests verify isolation. Annual penetration testing.
β οΈ Cost overrun (LLM usage spikes unexpectedly)
Medium (during peak hiring)β Mitigation: Cost guardrails: alert if daily spend > $100. Auto-switch to cheaper model (GPT-3.5) if budget hit. Monthly cost reviews.
β οΈ Agent orchestration bugs (infinite loops, stuck workflows)
Low (with testing)β Mitigation: Max retry limit (3 attempts). Timeout on all agent calls (30s). Dead-letter queue for stuck jobs. On-call alerts.
Evolution Roadmap
Progressive transformation from MVP to scale
Phase 1: MVP (0-3 months)
Phase 2: Scale (3-6 months)
Phase 3: Enterprise (6-12 months)
Complete Systems Architecture
9-layer architecture from presentation to security
Presentation
3 components
API Gateway
3 components
Agent Layer
6 components
ML Layer
4 components
Integration
4 components
Data
3 components
External
4 components
Observability
4 components
Security
4 components
Sequence Diagram - New Hire Flow
Automated data flow every hour
Data Flow - New Hire Onboarding
From HRIS webhook to first survey in 7 days
Key Integrations
HRIS (BambooHR, Workday, Rippling)
Slack
Calendar (Google/Outlook)
Survey Tools (Typeform)
Security & Compliance
Failure Modes & Fallbacks
| Failure | Fallback | Impact | SLA |
|---|---|---|---|
| LLM API down (Claude/GPT unavailable) | Fall back to secondary LLM (Gemini) β If all fail: use default template | Degraded quality, not broken | 99.5% (multi-LLM failover) |
| HRIS API timeout or rate limit | Queue sync job for retry (15 min delay) β Alert if fails 3x | Delayed onboarding (up to 1 hour) | 99.0% (eventual consistency) |
| PII detection service fails | Block processing (fail-safe) β Route to manual review queue | Safety first (no PII leak) | 100% (zero PII leaks) |
| Slack API down | Fall back to email (SendGrid) β If email fails: queue for retry | Delivery method changes, message still sent | 99.9% (multi-channel delivery) |
| Database unavailable (primary down) | Switch to read replica (read-only mode) β Queue writes for later | Read-only system (can view plans, can't create new) | 99.9% (automatic failover) |
| Evaluator Agent scores plan too low (<0.7) | Route to human review queue β HR manager approves manually | Delayed onboarding (up to 4 hours) | 95% auto-approval rate |
| Survey delivery fails (no response after 7 days) | Send reminder via alternate channel (email if Slack failed) β Escalate to manager | Lower response rate (target: 85%, acceptable: 70%) | 85% response rate |
ββββββββββββββββββββββββββββββββββββββββββββββββ
β Agent Orchestrator (LangGraph) β
β (Routes requests, manages state, retries) β
βββββββββββββββββ¬βββββββββββββββββββββββββββββββ
β
βββββββββββββΌββββββββββββ¬βββββββββββ¬ββββββββββββ¬βββββββββββ
β β β β β β
βββββΌβββββ ββββΌβββ βββββββΌββββ ββββΌβββββββ βββΌβββββββ ββΌβββββββββ
βGuardrailβ βPlan β βExecutor β βEvaluatorβ βSurvey β βIntegr. β
β Agent β βAgentβ β Agent β β Agent β βAgent β βAgent β
β(PII) β β(LLM)β β(Tasks) β β(Quality)β β(Surveys)β β(HRIS) β
βββββ¬βββββ ββββ¬βββ βββββββ¬ββββ ββββ¬βββββββ βββ¬βββββββ ββ¬βββββββββ
β β β β β β
ββββββββββββ΄ββββββββββββ΄βββββββββββ΄ββββββββββββ΄βββββββββββ
β
βββββββββΌβββββββββ
β Data Layer β
β (PostgreSQL + β
β Redis + S3) β
ββββββββββββββββββπAgent Collaboration Flow
πAgent Types
Reactive Agent
LowIntegration Agent - Responds to webhooks, no planning
Reflexive Agent
MediumGuardrail Agent - Uses rules + context for PII detection
Deliberative Agent
HighPlanner Agent - Plans multi-step onboarding based on role
Orchestrator Agent
HighestCentral Coordinator - Routes between agents, handles retries
πLevels of Autonomy
RAG vs Fine-Tuning
Hallucination Detection
Evaluation Framework
Dataset Curation
Agentic RAG
Cost Optimization
Tech Stack Summary
2026 Randeep Bhatia. All Rights Reserved.
No part of this content may be reproduced, distributed, or transmitted in any form without prior written permission.