AI Agent Management
Knowledge & Data Sources
Learn how to upload documents, connect websites, create Q&As, and manage all the content that powers your AI agent.
Overview
The Knowledge tab in the Docimal dashboard is where you manage all the content that powers your AI agent. It allows you to upload documents, add structured text snippets, crawl websites or sitemaps, create custom Q&As, and integrate with Notion. These options give you full control over the information your agent is trained on, helping ensure accurate, relevant, and up-to-date responses for your users.
Multiple Data Sources
Upload files, add text snippets, crawl websites, or integrate with Notion—all in one place
Smart Processing
Automatic text extraction, chunking, and vector embeddings for semantic search
Real-time Updates
Changes to your knowledge base are reflected in your agent after retraining
Version Control
Track when each source was added and last updated for complete visibility
Files
The Files section allows you to upload and manage various document types to train your AI agent. This is the most common way to provide knowledge to your agent.
Supported File Types
Docimal supports the following file formats:
- .pdf — PDF Documents
- .txt — Plain Text Files
- .doc / .docx — Microsoft Word Documents
- .csv — Comma-Separated Values
- .md — Markdown Files
Uploading Files
- 1Click the Upload Files button in the Knowledge tab
- 2Select one or multiple documents from your device
- 3Files will be queued and uploaded one by one. Monitor the status of each upload in real-time
- 4Once uploaded, click Retrain Agent to process the new content
Preview and Metadata
After upload, you can:
- Click on any document to preview its contents directly within the dashboard
- View timestamps indicating exactly when each file was added and last updated
- See file size, type, and processing status
- Check the number of chunks/segments created from each document
File Management
Delete files individually by clicking the three dots menu and selecting delete. To delete all files at once, select the checkbox next to "File sources" to select all documents, then click the Delete button that appears.
Text Snippets
The Text Snippets feature allows you to add and manage structured content without uploading files. This is ideal for maintaining smaller, frequently updated pieces of information separate from document uploads.
Creating Text Snippets
Create multiple text snippets, each with a unique title to help you easily identify the content. This is particularly useful for segmenting information by topic, department, or use case.
Rich Text Formatting
Each snippet supports full rich text editing:
- • Add headings for clarity
- • Format with bold, italic, or strikethrough
- • Create ordered or bullet lists
- • Insert hyperlinks to external sources
- • Use Markdown syntax for better structure
Common Use Cases
Company Policies
Store HR policies, code of conduct, or internal guidelines
Product Information
Maintain product specs, pricing, or feature descriptions
Quick Updates
Add temporary announcements or seasonal information
Structured Data
Store contact lists, schedules, or reference tables
Website Crawling
The Website Crawling feature enables you to train your AI agent using content directly from websites. Whether you're working with a full site, a sitemap, or individual URLs, this tool gives you flexible control over what gets included in your agent's knowledge base.
Crawling Options
1. Crawl a Full Website
Provide the homepage URL and let Docimal discover all public pages automatically
2. Submit a Sitemap
Point to an XML sitemap to fetch a structured list of URLs efficiently
3. Add Individual Links
Manually input specific URLs you want to include for precise control
Path Filtering
Refine your crawl using path filters:
- Include Paths — Only URLs matching these paths will be fetched (e.g.,
/docs/*,/blog/*) - Exclude Paths — URLs matching these paths will be skipped (e.g.,
/admin/*,/login)
Link Management
Once crawling is complete:
- All links from a single domain are grouped under the homepage URL for easy management
- Click on a homepage group to view all fetched links
- Preview the content of each link by clicking on it
- Edit or exclude specific links from a group as needed
- Recrawl the website anytime to fetch new pages
Custom Q&A Training
The Custom Q&A feature lets you train your AI agent with specific question-and-answer pairs, enabling it to respond precisely to frequently asked or business-specific queries. This ensures your agent provides exact answers for critical questions.
Creating Q&As
- Each Q&A entry starts with a descriptive title for quick organization
- Add multiple question variations to improve recognition and matching
- Provide a single definitive answer that will be used when matched
- Use rich text formatting to make answers clear and scannable
Answer Priority
Usage Analytics
Click on any Q&A to open its detail view and see:
- Number of times the question has been asked by users (updated in real-time)
- Last time the question was asked
- Date the Q&A was added
- Visual chart showing frequency over time
These insights help you identify which topics matter most to your users and prioritize updates accordingly.
Best Practices for Q&As
✓ Do
- • Cover edge cases and specific scenarios
- • Include multiple phrasings of the same question
- • Keep answers concise but complete
- • Update regularly based on analytics
✗ Avoid
- • Overly generic questions
- • Answers that frequently change
- • Duplicate Q&As with slight variations
- • Questions already covered well by documents
Notion Integration
Connect your Notion workspace to Docimal to enable your AI agent to access and utilize information stored in your Notion databases. This integration keeps your knowledge base synchronized with your team's documentation in Notion.
Setup Requirements
What Gets Synced
- All pages and sub-pages you grant access to
- Database entries with their properties
- Text content, headings, and formatting
- Linked pages and references
Content updates in Notion are synced automatically when using Auto Sync (available on Standard and Pro plans).
Auto Sync
Auto Sync automatically keeps your AI agent up-to-date by pulling the latest content from your data sources every 24 hours. This ensures your agent always has access to the most current information without requiring manual intervention.
Supported Data Sources
Auto Sync works with the following source types:
- Websites — Discovers newly added links and updates existing page content
- Notion — Syncs changes from your connected Notion workspace
- Remote Storage — Google Drive, Dropbox, and other cloud sources
How It Works
- 1Your agent automatically fetches new content from all connected sources once every 24 hours
- 2Newly added pages or links on your website are automatically discovered and included
- 3Changes in Notion pages and databases are detected and synced
- 4No manual action required—updates happen in the background automatically
Best Practices
Use Plain Text with Markdown
All data should be in plain text format. Use Markdown syntax for formatting—it's processed more accurately than complex document layouts. Avoid images with text; use actual text instead.
Structure Your Content
Use clear headings, bullet points, and logical sections. Well-structured content helps the AI understand context and relationships between information, leading to better responses.
Files Must Contain Selectable Text
When uploading PDFs, ensure they contain selectable text rather than scanned images. Use OCR tools to convert image-based PDFs to text-selectable format before uploading.
Always Retrain After Changes
Remember to click the "Retrain Agent" button after adding, deleting, or updating your knowledge sources. Changes won't be reflected in your agent until retraining is complete.
Organize by Topic
Use descriptive names for files, snippets, and Q&As. Group related content together and consider creating separate knowledge bases for different topics or departments for better organization.
Test Incrementally
Don't wait until you've uploaded all documents. Test your agent in the Playground after each major content addition to catch issues early and verify response quality.