Skip to content

PavloGor/campus-docs-assistant

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

169 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

image_banner

image_rag_flow

image_install_guide

INTRODUCTION

The Campus Docs Assistant is an AI-powered platform designed to streamline access to academic and administrative information within universities. Through a user-friendly chatbot interface, students, faculty, and staff can interact naturally with an intelligent assistant capable of answering questions, retrieving official documents, and offering support grounded in institutional data. This natural language interaction simplifies complex information retrieval tasks, eliminating the need to manually navigate dense and often confusing documentation.

By leveraging state-of-the-art AI technologies, the assistant understands complex queries, performs context-aware document retrieval, and generates accurate and concise responses in real time. Its web-based interface ensures accessibility while promoting autonomy in accessing institutional knowledge. This makes the Campus Docs Assistant a valuable tool for educational institutions aiming to enhance user experience, reduce repetitive inquiries, and improve the overall efficiency of information management.


AVATARS

Avatar Usage Meaning
face User messages Indicates messages sent by the user
smart_toy Assistant Responses Standard replies generated by the assistant
mindfulness Assistant Streaming Direct Response Responses generated using the assistant's own knowledge
psychology Assistant Streaming Tool Response Responses generated with the help of integrated tools
cognition Indexing Operations Indicates that the assistant is processing documents
psychology_alt Error or Reset Messages System-level feedback such as error messages or resets

This legend provides a clear understanding of the avatars used in the application and their significance


MOTIVATION

This project was initially inspired by the specific challenges observed at the Federal University of Mato Grosso do Sul (UFMS), but the problem it addresses is common across many universities. In academic settings, students often struggle to obtain simple pieces of information due to the overwhelming complexity and volume of official documentation. Regulations, guidelines, and institutional policies are typically stored in dense, legalistic documents that are not user-friendly or easy to navigate.

In practice, students seeking a single answer β€” such as internship requirements, enrollment rules, or calendar dates β€” frequently end up reading through dozens of pages of official publications. Frustrated by this experience, many resort to contacting academic coordinators directly. However, from the administration's perspective, this creates a high volume of repetitive inquiries that could have been answered if students had easier access to the right part of the documentation.

This cycle results in inefficiency and dissatisfaction on both sides: students receive vague or delayed responses, and coordinators are overwhelmed by simple questions that require them to redirect students to existing official documents. The Campus Docs Assistant was developed to break this cycle, acting as a bridge between formal documentation and practical student needs. By enabling natural language interaction and intelligent information retrieval, it aims to reduce friction, save time, and promote autonomous access to institutional knowledge.


demo_web.mov
demo_file.mov

FEATURES

network AI-Powered Query Handling

  • The assistant processes user queries using the Maritalk large language model, which is optimized for conversational AI and advanced natural language understanding.

  • It includes a tool decision system that evaluates the context of each query to determine whether to generate a direct response or trigger external tools, ensuring intelligent and context-aware interactions.

search_engine Smart Document Retrieval

  • The assistant performs semantic search using Pinecone, a high-performance vector database, allowing retrieval of the most relevant documents based on meaning rather than keywords.

  • It uses Ollama embeddings to convert documents and user queries into vector representations, enabling fast and accurate similarity matching for academic and administrative content.

contextual Context-Aware Responses

  • Implements a retrieve and generate mechanism that blends user queries with retrieved content to produce accurate, relevant answers. Leveraging LangChain's retrieval augmented generation logic under the hood.

  • Maintains dynamic context management, keeping the conversation history clean and focused to ensure that responses remain concise and contextually accurate.

internet Web Scraping and Indexing

  • Integrates Playwright to render and scrape dynamic web pages, allowing the assistant to index and respond with external institutional content.

  • Utilizes intelligent document chunking to split large texts into digestible parts for efficient indexing and retrieval, enabling high performance even with large datasets.

forum Interactive User Interface

  • Built with Streamlit the assistant features a responsive and interactive UI where users can submit queries, view answers, and configure behaviorβ€”all within an accessible web interface.

graph Modular and Scalable Architecture

  • Employs a LangGraph based state machine, where conversational logic is handled through dynamic workflows ensuring flexibility in managing tool calls, memory and state transitions.

  • Designed with robust error handling to gracefully manage runtime issues, API failures and unexpected user input across various system components.

demo_graph.mov

GETTING STARTED

This guide outlines how to use the Campus Docs Assistant in two distinct scenarios:

  • Creating and managing a new knowledge base for Coordinators & Institutions

  • Accessing and querying an existing knowledge base for Students & Users

All users regardless of role need the following:

  • Ollama installed and running locally
  • Access to a Maritalk API key
  • Access to Pinecone credentials

person_shield Role-Based Configuration [Coordinators / Institutions]

As a coordinator or institution representative you are responsible for:

Creating and configuring the Pinecone vector database

  • Create a Pinecone index with the appropriate dimension size
  • Ensure the index dimension matches your chosen embedding model dimension

Indexing content into the knowledge base

  • Upload PDFs containing institutional content
  • Add URLs for web-based institutional resources
  • Maintain and update the knowledge base as needed

Providing access credentials to students

  • Share the Pinecone API key and index name
  • Provide Maritalk API access information
  • Communicate which embedding model students should use

person_search Role-Based Configuration [Students / Users]

As a student or end-user you will focus on using the assistant not configuring indexing:

Use existing knowledge base credentials

  • Obtain necessary credentials from your institution
  • Configure the assistant with these credentials
  • Query the knowledge base using natural language

Avoid modifying the shared knowledge base

  • While technically possible to index content to a shared knowledge base this is not recommended
  • If you need your own knowledge base, create a separate Pinecone index

SETUP GUIDELINES

To begin, it's recommended to use the nomic-embed-text model for generating embeddings, as it provides an output dimension of $768$. When setting up your Pinecone index, ensure that its dimensionality matches the output of the embedding model β€” this compatibility is crucial for proper functioning. Additionally, make sure that the Ollama runtime is running locally, as the assistant depends on it to operate. Lastly, if you intend to incorporate personal knowledge bases, avoid making changes to the shared Pinecone index. Instead, create a separate index to keep your data isolated and organized.


INSTALLATION GUIDE

Clone the Repository

$ git clone https://github.com/GiovaneIwamoto/campus-docs-assistant.git
$ cd campus-docs-assistant

Install Dependencies

$ pip install -r requirements.txt

Install Playwright

$ pip install playwright
$ playwright install

Run the Application

$ cd app
$ streamlit run app.py

CONTACT & SUPPORT

Whether you're a developer, university representative, coordinator, or student exploring or deploying the Campus Docs Assistant, I'm here to help and collaborate! Feel free to reach out for:

  • General inquiries about the project
  • Feature requests or suggestions
  • Help provisioning resources
  • Troubleshooting installation or usage issues
  • Collaboration opportunities or academic use cases

Email: giovaneiwamoto@gmail.com

You can also open an issue on GitHub for bug reports or enhancements.


LIKE THE PROJECT

If you find this project useful or believe in its potential to enhance academic processes, consider giving it a β˜… star on GitHub β€” it really helps with visibility and community support!


journal

About

πŸŽ“ Campus Docs Assistant – Built to solve a common challenge in universities, this AI-powered chatbot uses LLMs, intelligent agents and RAG to make academic documents and institutional information easier to access and understand.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%