Developing an Autonomous Dual-Chatbot System for Research Paper Digesting

As a researcher, reading and understanding scientific papers has always been a crucial part of my daily routine. I still remember the tricks I learned in grad school for how to digest a paper efficiently. However, with countless research papers being published every day, I felt overwhelmed to keep up to date with the latest research trends and insights. The old tricks I learned can only help so much.

Things start to change with the recent development of large language models (LLMs). Thanks to their remarkable contextual understanding capability, LLMs can fairly accurately identify relevant information from the user-provided documents and generate high-quality answers to the user’s questions about the documents. A myriad of document Q&A tools have been developed based on this idea and some tools are designed specifically to assist researchers in understanding complex papers within a relatively short amount of time.

Although it’s definitely a step forward, I noticed some friction points when using those tools. One of the main issues I had is prompt engineering. Since the quality of LLM responses depends heavily on the quality of my questions, I often found myself spending quite some time crafting the "perfect" question. This is especially challenging when reading papers in unfamiliar research fields: oftentimes I simply don’t know what questions to ask.

This experience got me thinking: is it possible to develop a system that can automate the process of Q&A about research papers? A system that can distill key points from a paper more efficiently and autonomously?

Previously, I worked on a project where I developed a dual-chatbot system for language learning. The concept there was simple yet effective: by letting two chatbots chat in a user-specified foreign language, the user could learn the practical usage of the language by simply observing the conversation. The success of this project led me to an interesting thought: could a similar dual-chatbot system be useful for understanding research papers as well?

So, in this blog post, we are going to bring this idea to life. Specifically, we will walk through the process of developing a dual-chatbot system that can digest research papers in an autonomous manner.

To make this journey a fun experience, we are going to approach it as a software project and run a Sprint: we will begin with "ideation", where we introduce the concept of leveraging a dual-chatbot system to tackle our problem. Then comes the "Sprint execution", during which we’ll incrementally build the features of our design. Lastly, we will show our demo in the "Sprint review" and reflect on the learnings and future opportunities in the "Sprint Retrospective".

Ready to run the Sprint? let’s get started!

This is the 2nd blog on my series of LLM projects. The 1st one is Building an AI-Powered Language Learning App, and the 3rd one is Training Soft Skills in Data Science with Real-Life Simulations. Feel free to check them out!

Table of Content

· 1. Concept: dual-chatbot system · 2. Sprint Planning: what we want to build · 3. Feature 1: Document Embedding Engine · 4. Feature 2: Dual-Chatbot System ∘ 4.1 Abstract chatbot class ∘ 4.2 Journalist chatbot class ∘ 4.3 Author bot class ∘ 4.4 Quick test: the interview · 5. Feature 3: User Interaction ∘ 5.1 Creating the chat environment (in Jupyter Notebook) ∘ 5.2 Implementing PDF highlighting functionality ∘ 5.3 Allowing user input for questions ∘ 5.4 Allowing downloading the generated script · 6. Sprint Review: show the demo! · 7. Sprint Retrospective

1. Concept: dual-chatbot system

The foundation of our solution lies in the concept of a dual-chatbot system. As its name implies, this system involves two chatbots (powered by large language models) engaging in an autonomous dialogue. By specifying a high-level task description and assigning relevant roles to the chatbots, users can guide the conversation toward their desired direction.

To give a concrete example: in my previous project where a dual-chatbot is developed for assisting language learning, the learner (user) can specify a real-life scenario (e.g., dining at a restaurant) and assign roles for chatbots to play (e.g., bot 1 as the waitstaff and bot 2 as the customer), the two bots would then simulate a conversation in the user’s chosen foreign language, mimicking the interaction between the assigned roles in the given scenario. This allows an on-demand generation of fresh, scenario-specific language learning materials, therefore helping users better understand language usage in real-life situations.

So, how do we adapt this concept for the autonomous digestion of research papers?

The key lies in the role assignment. More specifically, one bot could take the role of a "journalist", whose main task is to conduct an interview to understand and extract key insights from a research paper. Meanwhile, the other bot could play the role of an "author", who has full access to the research paper and is tasked with providing comprehensive answers to the "journalist" bot’s queries.

When it comes to interaction, the journalist bot will initiate the dialogue and kicks off the interview process. The author bot will then serve as a conventional document Q&A engine and answer the journalist’s questions based on the relevant context of the research paper. The journalist bot then follows up with additional questions for further clarification. Through this iterative Q&A process, the key contributions, methodology, and findings of the research paper could be automatically extracted.

An illustration of the workflow of the dual-chatbot system. (Image by author)

This dual-chatbot system described above introduces a shift from the traditional user-chatbot interaction: instead of users thinking about the right questions to ask the LLM model, the introduced "journalist" bot will automatically come up with suitable questions on the user’s behalf. This approach could bypass the need for users to craft appropriate prompts, thus significantly reducing the users’ cognitive load. This is especially useful when delving into unfamiliar research fields. Overall, the dual-chatbot system may constitute a more user-friendly, efficient, and engaging method for distilling complex scientific research papers.

Next up, let’s move to Sprint planning and define several user stories we would like to address in this project.

2. Sprint Planning: what we want to build

With the concept in place, it’s time to plan our current Sprint. In line with the common practice of Agile development, our Sprint planning will evolve around user stories.

In Agile development, a user story is a concise, informal, and simple description of a feature or functionality from an end-user perspective. It is a common practice used in Agile development to define and communicate requirements in a way that is understandable and actionable for the development team.

🎯 User story 1: document embedding

"As a user, I want to input research papers in PDF format into the system, and I want the system to convert my input paper into a machine-readable format so that the dual-chatbot system can understand and analyze it efficiently." (Generated by GPT-4)

This user story focuses on data ingestion. Essentially, we need to build a data-processing pipeline that includes document loading, splitting, embedding creation, and embedding storage.

Here, "embeddings" refer to the numerical representations of the text data. By creating a numerical representation of each part of a research paper, the author bot can better understand the semantic meaning of the research paper and be able to accurately answer the journalist bot’s questions.

Additionally, we need to have a database to store the computed embeddings of the research paper. This database needs to be readily accessible by the author bot to facilitate fast and accurate answer generation.

In section 3, we will address this user story by leveraging the OpenAI Embeddings API along with the meta’s FAISS vector store.

🎯 User story 2: dual-chatbot

"As a user, I want to observe an autonomous conversation between two chatbots – one playing the role of a ‘journalist’ asking questions and the other playing the role of an ‘author’ answering them, derived from the contents of the research paper. This will help me understand the paper’s key points without needing to read it in its entirety or craft my own questions." (Generated by GPT-4)

This user story represents the cornerstone of our project: the development of the dual-chatbot system. As discussed in the "Concept" section, we need to construct two types of chatbot classes: one that is able to develop a series of questions to query the details of the paper (i.e., the journalist bot), and another that can leverage document embeddings to generate comprehensive answers to these questions (i.e., the author bot).

In section 4, we will focus on addressing this user story by using the Langchain framework.

🎯 User story 3: chat environment

"As a user, I want an intuitive chat interface where I can watch the chatbots’ conversation unfold in real-time." (Generated by GPT-4)

The goal of this user story is to build a chat environment where users can view the generated dialogue between the journalist and author bots. In the spirit of MVP (minimum viable product), we will use simple Jupyter widgets to demonstrate the chat environment in section 5.1.

🎯 User story 4: PDF highlighting

"As a user, I want to have the corresponding parts in the original research paper highlighted based on the chatbot’s discussion. This will help me to quickly locate the sources of the information discussed during the conversation." (Generated by GPT-4)

This user story focuses on providing the users with the traceability of the Q&A. For every answer generated by the author bot, it is natural for users to understand precisely where the discussed information is originating from in the research paper. Not only does this feature enhances the transparency of our dual-chatbot system, but it also allows for a more interactive and engaging user experience.

In section 5.2, we will leverage LangChain’s conversational retrieval chain to return the sources the author bot used to generate the answers and the PyMuPDF library to highlight the corresponding texts in the original PDF.

🎯 User story 5: user input

"As a user, I want to be able to intervene and ask my own questions in the midst of the chatbot’s conversation, this way I can direct the conversation and extract the information I need from the paper." (Generated by GPT-4)

This user story focuses on the need for user participation. While our target dual-chatbot system is designed to be autonomous, we also need to provide the option for users to ask their own questions. This feature ensures that the conversation does not just go in a direction set by the bots, but it can be guided by the user’s own curiosity and interests. Also, it is very likely that users may get inspired by watching the first rounds of conversation, and would like to ask follow-up questions or dig deeper into certain aspects that are of particular interest to them. All these underline the importance of user intervention.

In section 5.3, we will address this user story by upgrading our user interface in Jupyter Notebook.

🎯 User story 6: download scripts

"As a user, I want to be able to download a transcript of the chatbot conversation. This will allow me to review the key points offline or share the information with my colleagues." (Generated by GPT-4)

This user story focuses on the accessibility and shareability of the generated content. Although users can view the conversation in a dedicated chat environment, it is beneficial to provide users with a record of the discussion that they can review later and share with others.

In section 5.4, we will use the PDFDocument library **** to convert the generated script into a PDF file for users to download.

So much for the planning, time to get to work!

Our planned user stories. (Image by author)

3. Feature 1: Document Embedding Engine

Let’s implement the first feature of our paper digesting app: the document embedding engine. Here, we will build a data-processing class with the functionality of document loading, splitting, embedding creation, and storage. This addresses our first user story:

"As a user, I want to input research papers in PDF format into the system, and I want the system to convert my input paper into a machine-readable format so that the dual-chatbot system can understand and analyze it efficiently." (Generated by GPT-4)

We start by creating a embedding_engine.py file and import necessary libraries:

from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.document_loaders import PyMuPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.chains.summarize import load_summarize_chain
from langchain.chat_models import ChatOpenAI
from langchain.utilities import ArxivAPIWrapper
import os

We then instantiate an embedding model by using OpenAI embeddings API:

class Embedder:
    """Embedding engine to create doc embeddings."""

    def __init__(self, engine='OpenAI'):
        """Specify embedding model.

        Args:
        --------------
        engine: the embedding model. 
                For a complete list of supported embedding models in LangChain, 
                see https://python.langchain.com/docs/integrations/text_embedding/
        """
        if engine == 'OpenAI':
            # Reminder: need to set up openAI API key 
            # (e.g., via environment variable OPENAI_API_KEY)
            self.embeddings = OpenAIEmbeddings()

        else:
            raise KeyError("Currently unsupported chat model type!")

Next, we define the function for loading and processing PDF files:

def load_n_process_document(self, path):
    """Load and process PDF document.

    Args:
    --------------
    path: path of the paper.
    """

    # Load PDF
    loader = PyMuPDFLoader(path)
    documents = loader.load()

    # Process PDF
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
    self.documents = text_splitter.split_documents(documents)

Here, we have used PyMuPDFLoader to load the PDF file, which, under the hood, leverages the PyMuPDF library to parse the PDF file. The returned documents variable is a list of LangChain Document() objects. Each Document() object corresponds to one page of the original PDF, with the page content stored in the page_content key and associated metadata (e.g., page number, etc.) stored in the metadata key.

After parsing the loaded PDF, we used RecursiveCharacterTextSplitter from LangChain to split the original PDF into multiple smaller chunks. Since the author bot will later use relevant texts from the PDF to answer questions, creating small chunks of text can not only help the author bot to focus on specific details to answer the question, but also ensure that the context provided to the author bot will not exceed the token limit of the employed LLM.

Next, we set up the vector store to manage the text embedding vectors:

def create_vectorstore(self, store_path):
    """Create vector store for doc Q&amp;A.
       For a complete list of vector stores supported by LangChain,
       see: https://python.langchain.com/docs/integrations/vectorstores/

    Args:
    --------------
    store_path: path of the vector store.

    Outputs:
    --------------
    vectorstore: the created vector store for holding embeddings
    """
    if not os.path.exists(store_path):
        print("Embeddings not found! Creating new ones")
        self.vectorstore = FAISS.from_documents(self.documents, self.embeddings)
        self.vectorstore.save_local(store_path)

    else:
        print("Embeddings found! Loaded the computed ones")
        self.vectorstore = FAISS.load_local(store_path, self.embeddings)

    return self.vectorstore

Here, we used Facebook AI Similarity Search (FAISS) library to serve as our vector store, which takes the loaded PDF and the embedding engine as the inputs to its constructor. The created self.vectorstore holds the embedding vectors of individual PDF chunks we created earlier. At query time, it will invoke the embedding engine to embed the question and then retrieve the embedding vectors that are ‘most similar’ to the embedded query. The texts that correspond to the most similar embedding vectors will be fed to the author bot as the context to assist its answer generation. This process is known as vector search and forms the backbone for document Q&A.

Finally, we create a helper function to generate a short summary of the paper. This will be useful later for setting the stage for the journalist bot.

def create_summary(self, llm_engine=None):
    """Create paper summary. 
    The summary is created by using LangChain's summarize_chain.

    Args:
    --------------
    llm_engine: backbone large language model.

    Outputs:
    --------------
    summary: the summary of the paper
    """

    if llm_engine is None:
        raise KeyError("please specify a LLM engine to perform summarization.")

    elif llm_engine == 'OpenAI':
        # Reminder: need to set up openAI API key 
        # (e.g., via environment variable OPENAI_API_KEY)
        llm = ChatOpenAI(
            model_name="gpt-3.5-turbo",
            temperature=0.8
        )

    else:
        raise KeyError("Currently unsupported chat model type!")

    # Use LLM to summarize the paper
    chain = load_summarize_chain(llm, chain_type="stuff")
    summary = chain.run(self.documents[:20])

    return summary

we resort to LLMs to create the summary. Technically speaking, we can achieve that goal by using LangChain’s load_summarize_chain, which takes the LLM model and the summarization method as inputs.

In terms of the summarization method, here, we have used the stuff method, which simply "stuff" all the documents into a single context and prompts the LLM to generate the summary. For other more advanced methods, please refer to the official page of LangChain.

Great! Now that we have developed the Embedder class to handle the document loading, splitting, as well as embedding creation and storage, we can move on to the core of our app: the dual-chatbot system.

4. Feature 2: Dual-Chatbot System

In this section, we address our second user story:

"As a user, I want to observe an autonomous conversation between two chatbots – one playing the role of a ‘journalist’ asking questions and the other playing the role of an ‘author’ answering them, derived from the contents of the research paper. This will help me understand the paper’s key points without needing to read it in its entirety or craft my own questions." (Generated by GPT-4)

We will start by creating an abstract base class for defining the common behaviors of the chatbots. Afterward, we will develop the individual journalist bot and the author bot that inherit from the chatbot base class. We put all the class definitions in chatbot.py.

4.1 Abstract chatbot class

Since our journalist bot and the author bot share a lot of similarities (as they are all role-playing bots), it is a good practice to encapsulate the definition of their shared behaviors within an abstract base class:

from abc import ABC, abstractmethod
from langchain.chat_models import ChatOpenAI

class Chatbot(ABC):
    """Class definition for a single chatbot with memory, created with LangChain."""

    def __init__(self, engine):
        """Initialize the large language model and its associated memory.
        The memory can be an LangChain emory object, or a list of chat history.

        Args:
        --------------
        engine: the backbone llm-based chat model.
        """

        # Instantiate llm
        if engine == 'OpenAI':
            # Reminder: need to set up openAI API key 
            # (e.g., via environment variable OPENAI_API_KEY)
            self.llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.8)

        else:
            raise KeyError("Currently unsupported chat model type!")

    @abstractmethod
    def instruct(self):
        """Determine the context of chatbot interaction. 
        """
        pass

    @abstractmethod
    def step(self):
        """Action produced by the chatbot. 
        """
        pass

    @abstractmethod
    def _specify_system_message(self):
        """Prompt engineering for chatbot.
        """       
        pass

We defined three common methods:

instruct: this method is used to set up the chatbot and attach memory to it.
step: this method is used to feed input to the chatbot and receive the bot’s response.
specify_system_message: this method is used to give the chatbot specific instructions regarding how it should behave during the conversation.

With the chatbot template in place, we are ready to create two specific chatbot roles, i.e., the journalist bot and the author bot.

4.2 Journalist chatbot class

The journalist bot’s role is to interview the author bot and extract key insights from a research paper. With that in mind, let’s fill the template methods with concrete code.

from langchain.memory import ConversationBufferMemory

class JournalistBot(Chatbot):
    """Class definition for the journalist bot, created with LangChain."""

    def __init__(self, engine):
        """Setup journalist bot.

        Args:
        --------------
        engine: the backbone llm-based chat model.
        """

        # Instantiate llm
        super().__init__(engine)

        # Instantiate memory
        self.memory = ConversationBufferMemory(return_messages=True)

In the constructor method, besides specifying a backbone LLM, another important component for the journalist bot is the memory object. Memory tracks the conversation history and serves as the key to helping the journalist bot avoid repetitive or irrelevant questions and generate meaningful follow-up questions. Technically, we achieved that by using the ConversationBufferMemory provided by LangChain, which simply prepends the last few inputs/outputs to the current input of the chatbot.

Next, we set up the journalist chatbot by creating aConversationChain, with the previously defined backbone LLM, the memory object, as well as the prompt for the chatbot. Note that we have also specified topic (the paper topic) and abstract (the paper summary), which will be used later to provide the context of the paper to the journalist bot.

from langchain.chains import ConversationChain
from langchain.prompts import (
    ChatPromptTemplate, 
    MessagesPlaceholder, 
    SystemMessagePromptTemplate, 
    HumanMessagePromptTemplate
)

def instruct(self, topic, abstract):
    """Determine the context of journalist chatbot. 

    Args:
    ------
    topic: the topic of the paper
    abstract: the abstract of the paper
    """

    self.topic = topic
    self.abstract = abstract

    # Define prompt template
    prompt = ChatPromptTemplate.from_messages([
        SystemMessagePromptTemplate.from_template(self._specify_system_message()),
        MessagesPlaceholder(variable_name="history"),
        HumanMessagePromptTemplate.from_template("""{input}""")
    ])

    # Create conversation chain
    self.conversation = ConversationChain(memory=self.memory, prompt=prompt, 
                                          llm=self.llm, verbose=False)

In LangChain, the prompt generation and ingestion for instructing the chatbot are handled via different prompt templates. For our current application, the most critical piece is setting the SystemMessagePromptTemplate, as it allows us to give a high-level purpose to the journalist bot and also define its desired behaviors.

The followings are the details of the instruction. Note that the instruction/prompt is generated and optimized by using ChatGPT (GPT-4). This is beneficial as in the current case, the LLM-generated prompts tend to consider more nuances than the human-crafted ones. Additionally, generating high-level instructions with LLM represents a more scalable solution for adapting the systems to other scenarios beyond "journalist-author" interactions.

def _specify_system_message(self):
    """Specify the behavior of the journalist chatbot.
    The prompt is generated and optimized with GPT-4.

    Outputs:
    --------
    prompt: instructions for the chatbot.
    """       

    prompt = f"""You are a technical journalist interested in {self.topic}, 
    Your task is to distill a recently published scientific paper on this topic through
    an interview with the author, which is played by another chatbot.
    Your objective is to ask comprehensive and technical questions 
    so that anyone who reads the interview can understand the paper's main ideas and contributions, 
    even without reading the paper itself. 
    You're provided with the paper's summary to guide your initial questions.
    You must keep the following guidelines in mind:
    - Focus exclusive on the technical content of the paper.
    - Avoid general questions about {self.topic}, focusing instead on specifics related to the paper.
    - Only ask one question at a time.
    - Feel free to ask about the study's purpose, methods, results, and significance, 
    and clarify any technical terms or complex concepts. 
    - Your goal is to lead the conversation towards a clear and engaging summary.
    - Do not include any prefixed labels like "Interviewer:" or "Question:" in your question.

    [Abstract]: {self.abstract}"""

    return prompt

Here, we provided the journalist bot with the paper’s research domain and abstract to serve as the base for initial questions. This mirrors the real-world scenario where a journalist initially only knows a little about the paper and needs to ask questions to gather more information.

Finally, we need a step method to interact with the journalist bot:

def step(self, prompt):
    """Journalist chatbot asks question. 

    Args:
    ------
    prompt: Previos answer provided by the author bot.
    """
    response = self.conversation.predict(input=prompt)

    return response

In this case, the input prompt will be the author bot’s answer to the journalist bot’s previous question. If the conversation has not started yet, the input prompt will simply be "Start the conversation", to prompt the journalist bot to start the interview.

That’s it for the journalist bot. Let’s now turn to the author bot.

4.3 Author bot class

The author bot’s role is to answer questions raised by the journalist bot based on the research paper. Here is the constructor method for the author bot:

class AuthorBot(Chatbot):
    """Class definition for the author bot, created with LangChain."""

    def __init__(self, engine, vectorstore, debug=False):
        """Select backbone large language model, as well as instantiate 
        the memory for creating language chain in LangChain.

        Args:
        --------------
        engine: the backbone llm-based chat model.
        vectorstore: embedding vectors of the paper.
        """

        # Instantiate llm
        super().__init__(engine)

        # Instantiate memory
        self.chat_history = []

        # Instantiate embedding index
        self.vectorstore = vectorstore

        self.debug = debug

There are two things changed here: first of all, unlike the journalist bot, the author bot should be able to access the full paper. Therefore, the vector store we created earlier needs to be provided to the constructor. Also, note that we are not using the memory object (e.g., ConversationBufferMemory) to track chat history anymore. Instead, we will simply use a list to store the history and later pass it explicitly to the author bot. Each element of the list will be a tuple of (query, answer). Both ways of maintaining conversation history are supported in LangChain.

Next, we set up the conversation chain for the author bot.

from langchain.chains import ConversationalRetrievalChain

def instruct(self, topic):
    """Determine the context of author chatbot. 

    Args:
    -------
    topic: the topic of the paper.
    """

    # Specify topic
    self.topic = topic

    # Define prompt template
    qa_prompt = ChatPromptTemplate.from_messages([
        SystemMessagePromptTemplate.from_template(self._specify_system_message()),
        HumanMessagePromptTemplate.from_template("{question}")
    ])

    # Create conversation chain
    self.conversation_qa = ConversationalRetrievalChain.from_llm(llm=self.llm, verbose=self.debug,
                                                                 retriever=self.vectorstore.as_retriever(
                                                                     search_kwargs={"k": 5}),
                                                                 return_source_documents=True,
                                                                combine_docs_chain_kwargs={'prompt': qa_prompt})

Since the author bot needs to answer questions by first retrieving relevant context, we adopted a ConversationalRetrievalChain. To quote from the official document of LangChain:

ConversationalRetrievalChain first combines the chat history (either explicitly passed in or retrieved from the provided memory) and the query into a standalone question, then looks up relevant documents from the retriever, and finally passes those documents and the query to a question answering chain to return a response.

Therefore, in addition to the backbone LLM, we also need to supply the chain with a vector store. Note that here we specified the number of returned relevant documents (PDF chunks) via search_kwargs. In general, selecting the right number is not a trivial task and deserves careful consideration of balancing accuracy, relevance, comprehensiveness, and computational resources. Lastly, we set return_source_documents to True, which is important for ensuring transparency and traceability in the Q&A process.

To interact with the author bot:

def step(self, prompt):
    """Author chatbot answers question. 

    Args:
    ------
    prompt: question raised by journalist bot.

    Outputs:
    ------
    answer: the author bot's answer
    source_documents: documents that author bot used to answer questions
    """
    response = self.conversation_qa({"question": prompt, "chat_history": self.chat_history})
    self.chat_history.append((prompt, response["answer"]))

    return response["answer"], response["source_documents"]

As discussed previously, we explicitly supplied the chat history (a list of previous query-answer tuples) to the conversation chain. As a result, we also need to manually append the newly obtained query-answer tuple to the chat history. For the response, we get not only the answer but also the source documents (PDF chunks) used by the author bot to generate the answer, which will be used later to highlight the corresponding texts in PDF.

Finally, we inform the role of the author bot and specify detailed instructions. Same as the journalist bot, the instruction/prompt for the author bot is also generated and optimized by using ChatGPT (GPT-4).

def _specify_system_message(self):
    """Specify the behavior of the author chatbot.
    The prompt is generated and optimized by GPT-4.

    Outputs:
    --------
    prompt: instructions for the chatbot.
    """       

    prompt = f"""You are the author of a recently published scientific paper on {self.topic}.
    You are being interviewed by a technical journalist who is played by another chatbot and
    looking to write an article to summarize your paper.
    Your task is to provide comprehensive, clear, and accurate answers to the journalist's questions.
    Please keep the following guidelines in mind:
    - Try to explain complex concepts and technical terms in an understandable way, without sacrificing accuracy.
    - Your responses should primarily come from the relevant content of this paper, 
    which will be provided to you in the following, but you can also use your broad knowledge in {self.topic} to 
    provide context or clarify complex topics. 
    - Remember to differentiate when you are providing information directly from the paper versus 
    when you're giving additional context or interpretation. Use phrases like 'According to the paper...' for direct information, 
    and 'Based on general knowledge in the field...' when you're providing additional context.
    - Only answer one question at a time. Ensure that each answer is complete before moving on to the next question.
    - Do not include any prefixed labels like "Author:", "Interviewee:", Respond:", or "Answer:" in your answer.
    """

    prompt += """Given the following context, please answer the question.

    {context}"""

    return prompt

That’s it for constructing the author bot.

4.4 Quick test: the interview

Time to take two bots for a ride!

To see if the developed journalist and author bot can engage in meaningful conversation toward the goal of digesting the paper, we pick one sample scientific research paper and run the test.

As I was working on physics-informed machine learning recently, here, I picked an arXiv paper named "Improved Training of Physics-Informed Neural Networks with Model Ensembles" (CC BY 4.0 license) **** for the test.

paper = 'Improved Training of Physics-Informed Neural Networks with Model Ensembles'

# Create embeddings
embedding = Embedder(engine='OpenAI')
embedding.load_n_process_document("../Papers/"+paper+".pdf")

# Set up vectorstore
vectorstore = embedding.create_vectorstore(store_path=paper)

# Fetch paper summary
paper_summary = embedding.create_summary(llm_engine='OpenAI')

# Instantiate journalist and author bot
journalist = JournalistBot('OpenAI')
author = AuthorBot('OpenAI', vectorstore)

# Provide instruction
journalist.instruct(topic='physics-informed machine learning', abstract=paper_summary)
author.instruct('physics-informed machine learning')

# Start conversation
for i in range(4):
    if i == 0:
        question = journalist.step('Start the conversation')
    else:
        question = journalist.step(answer)
    print("👨 ‍🏫  Journalist: " + question)

    answer, source = author.step(question)
    print("👩 ‍🎓  Author: " + answer)

The generated conversation script is shown below. Note that to save space, some of the author bot’s answers are not shown in full:

The interview between the developed journalist bot and the author bot. (Image by author)

Since the author bot only passively answers questions (i.e., a conventional Q&A agent), we focus our attention on the behavior of the journalist bot to assess if it can properly steer the interview. Here we can see that the journalist bot started with a general question about the paper (the motivation), then adapted its questions to dig deeper into the methodology of the proposed strategy. Overall, the behavior of the developed journalist bot aligns with our expectations and it is capable of conducting the interview toward distilling the key points from the given paper. Not bad😃

5. Feature 3: User Interaction

In this section, we wrap our previous experiment into a proper user interface. Toward that end, we will address three user stories to incrementally build the desired features.

5.1 Creating the chat environment (in Jupyter Notebook)

Let’s start with the 3rd user story:

"As a user, I want an intuitive chat interface where I can watch the chatbots’ conversation unfold in real-time." (Generated by GPT-4)

To keep things simple, we opt for Jupyter widgets as they allow quickly building a chat environment entirely in Jupyter Notebook.

First, we set up the layout of displaying conversation:

import ipywidgets as widgets
from IPython.display import display

# Create button
bot_ask = widgets.Button(description="Journalist Bot ask")

# Chat history
chat_log = widgets.HTML(
    value='',
    placeholder='',
    description='',
)

# Attach callbacks
bot_ask.on_click(bot_ask_clicked)

# Arrange widgets layout
first_row = widgets.HBox([bot_ask])

# Display the UI
display(chat_log, widgets.VBox([first_row]))

We created a button (bot_ask) such that when the user clicks it, a callback function bot_ask_clicked will be invoked and one round of conversation between the journalist and author bot will be generated. Afterward, we used the HTML widgets to display the conversation as HTML content in the notebook.

The callback function bot_ask_clicked is defined below. Besides showing the journalist bot’s question and the author bot’s answer, we also indicated the location (i.e., page number) of the relevant source texts. This is possible because the step() method of the author bot also returns the source variable, which is a list of LangChain Document object that contains the page content and its associated metadata.

def bot_ask_clicked(b):

    if chat_log.value == '':
        # Starting conversation 
        bot_question = journalist.step("Start the conversation")
        line_breaker = ""

    else:
        # Ongoing conversation
        bot_question = journalist.step(chat_log.value.split("<br><br>")[-1])
        line_breaker = "<br><br>"

    # Journalist question
    chat_log.value += line_breaker + "<b style='color:blue'>👨 ‍🏫  Journalist Bot:</b> " + bot_question      

    # Author bot answers
    response, source = author.step(bot_question)  

    # Author answer with source
    page_numbers = [str(src.metadata['page']+1) for src in source]
    unique_page_numbers = list(set(page_numbers))
    chat_log.value += "<br><b style='color:green'>👩 ‍🎓  Author Bot:</b> " + response + "<br>"
    chat_log.value += "(For details, please check the highlighted text on page(s): " + ', '.join(unique_page_numbers) + ")"

Putting everything together, we have the following interface:

5.2 Implementing PDF highlighting functionality

In our current UI, we only indicated on which pages the author bot looked for the answers to the journalist bot’s question. Ideally, the user would expect the relevant texts to be highlighted in the original PDF to allow quick reference. This is the motivation for the 4th user story:

"As a user, I want to have the corresponding parts in the original research paper highlighted based on the chatbot’s discussion. This will help me to quickly locate the sources of the information discussed during the conversation." (Generated by GPT-4)

To achieve this goal, we employed the PyMuPDF library to search for relevant texts and perform text highlighting:

import fitz

def highlight_PDF(file_path, phrases, output_path):
    """Search and highlight given texts in PDF.

    Args:
    --------
    file_path: PDF file path
    phrases: a list of texts (in string)
    output_path: save and output PDF
    """

    # Open PDF
    doc = fitz.open(file_path)

    # Search the doc
    for page in doc:
        for phrase in phrases:            
            text_instances = page.search_for(phrase)

            # Highlight texts
            for inst in text_instances:
                highlight = page.add_highlight_annot(inst)

    # Output PDF
    doc.save(output_path, garbage=4)

In the code above, the phrases is a list of strings, where each string represents one of the source texts used by the author bot to generate the answers. To highlight the texts, the code first loops over each page of the PDF and find if the phraseis contained on that page. Once the phrase is found, it will be highlighted in the original PDF.

To integrate this highlighting functionality into our previously developed chat UI, we first need to update the callback function:

def create_bot_ask_callback(title):

    def bot_ask_clicked(b):

        if chat_log.value == '':
            # Starting conversation 
            bot_question = journalist.step("Start the conversation")
            line_breaker = ""

        else:
            # Ongoing conversation
            bot_question = journalist.step(chat_log.value.split("<br><br>")[-1])
            line_breaker = "<br><br>"

        chat_log.value += line_breaker + "<b style='color:blue'>👨 ‍🏫  Journalist Bot:</b> " + bot_question      

        # Author bot answers
        response, source = author.step(bot_question)  

        ##### NEW: Highlight relevant text in PDF
        phrases = [src.page_content for src in source]
        paper_path = "../Papers/"+title+".pdf"
        highlight_PDF(paper_path, phrases, 'highlighted.pdf')
        ##### NEW

        page_numbers = [str(src.metadata['page']+1) for src in source]
        unique_page_numbers = list(set(page_numbers))
        chat_log.value += "<br><b style='color:green'>👩 ‍🎓  Author Bot:</b> " + response + "<br>"
        chat_log.value += "(For details, please check the highlighted text on page(s): " + ', '.join(unique_page_numbers) + ")"

    return bot_ask_clicked

Although the appearance of our UI stays the same:

under the hood, we would have a new PDF file, with relevant texts (on pages 1 and 10) properly highlighted:

5.3 Allowing user input for questions

Up till now, all the conversations between the two bots are autonomous. Ideally, users should also be able to ask their own questions if they see fit. This is exactly what we want to address for the 5th user story:

"As a user, I want to be able to intervene and ask my own questions in the midst of the chatbot’s conversation, this way I can direct the conversation and extract the information I need from the paper." (Generated by GPT-4)

To achieve that goal, we can add another button such that the user can decide if a new round of exchange should be initiated by the journalist bot or the user:

# Create "user ask" button
user_ask = widgets.Button(description="User ask")

# Define callback
def create_user_ask_callback(title):

    def user_ask_clicked(b):

        chat_log.value += "<br><br><b style='color:purple'>🙋 ‍♂️You:</b> " + user_input.value

        # Author bot answers
        response, source = author.step(user_input.value)

        # Highlight relevant text in PDF
        phrases = [src.page_content for src in source]
        paper_path = "../Papers/"+title+".pdf"
        highlight_PDF(paper_path, phrases, 'highlighted.pdf')

        page_numbers = [str(src.metadata['page']+1) for src in source]
        unique_page_numbers = list(set(page_numbers))
        chat_log.value += "<br><b style='color:green'>👩 ‍🎓  Author Bot:</b> " + response + "<br>"
        chat_log.value += "(For details, please check the highlighted text on page(s): " + ', '.join(unique_page_numbers) + ")"

        # Inform journalist bot about the asked questions 
        journalist.memory.chat_memory.add_user_message(user_input.value)

        # Clear user input
        user_input.value = ""

    return user_ask_clicked

The above callback function is essentially the same as the callback function for defining journalist-author interaction. The only difference is that the "question" will be directly input by the user. Also, to make the interview logic consistent, we appended the user question to the journalist bot’s memory, as if the user-supplied question was raised by the journalist bot.

We updated the main UI logic accordingly:

# Chat history
chat_log = widgets.HTML(
    value='',
    placeholder='',
    description='',
)

# User input question
user_input = widgets.Text(
    value='',
    placeholder='Question',
    description='',
    disabled=False,
    layout=widgets.Layout(width="60%")
)

# Attach callbacks
bot_ask.on_click(create_bot_ask_callback(paper))
user_ask.on_click(create_user_ask_callback(paper))

# Arrange the widgets
first_row = widgets.HBox([bot_ask])
second_row = widgets.HBox([user_ask, user_input])

# Display the UI
display(chat_log, widgets.VBox([first_row, second_row]))

And this is what we got, where users can input their own questions and get answered by the author bot:

Besides letting the journalist bot ask questions, users also have the opportunity to ask their own questions. (Image by author)

5.4 Allowing downloading the generated script

So far so good! As the last feature to implement, we want to be able to save the conversation history to our disk for later reference. This is the goal of the 6th user story:

"As a user, I want to be able to download a transcript of the chatbot conversation. This will allow me to review the key points offline or share the information with my colleagues." (Generated by GPT-4)

Toward that end, we added another button for downloading the script and attach a callback function to the button. In this callback, we used PDFDocument to convert the conversation script into a PDF file:

from pdfdocument.document import PDFDocument

download = widgets.Button(description="Download paper summary",
                         layout=widgets.Layout(width='auto'))

def create_download_callback(title):

    def download_clicked(b):
        pdf = PDFDocument('paper_summary.pdf')
        pdf.init_report()

        # Remove HTML tags
        chat_history = re.sub('<.*?>', '', chat_log.value)  

        # Remove emojis
        chat_history = chat_history.replace('👨 ‍🏫 ', '')
        chat_history = chat_history.replace('👩 ‍🎓 ', '')
        chat_history = chat_history.replace('🙋 ‍♂️', '')

        # Add line breaks
        chat_history = chat_history.replace('Journalist Bot:', 'nnnJournalist: ')
        chat_history = chat_history.replace('Author Bot:', 'nnAuthor: ')
        chat_history = chat_history.replace('You:', 'nnnYou: ')

        pdf.h2("Paper Summary: " + title)
        pdf.p(chat_history)
        pdf.generate()

        # Download PDF
        print('PDF generated successfully in the local folder!')

    return download_clicked

We updated the main UI logic accordingly:

# Chat history
chat_log = widgets.HTML(
    value='',
    placeholder='',
    description='',
)

# User input question
user_input = widgets.Text(
    value='',
    placeholder='Question',
    description='',
    disabled=False,
    layout=widgets.Layout(width="60%")
)

# Attach callbacks
bot_ask.on_click(create_bot_ask_callback(paper))
user_ask.on_click(create_user_ask_callback(paper))
download.on_click(create_download_callback(paper))

# Arrange the widgets
first_row = widgets.HBox([bot_ask])
second_row = widgets.HBox([user_ask, user_input])
third_row = widgets.HBox([download])

# Display the UI
display(chat_log, widgets.VBox([first_row, second_row, third_row]))

Now, we have a download button appearing in the UI. When the user clicks it, a paper summary PDF file will be automatically generated and downloaded to the local folder:

Users now have the option to download the script of the generated conversation. (Image by author)

6. Sprint Review: show the demo!

It’s time to put up a demo to showcase our hard work 💪

In this demo, we showed the full functionality of our developed dual-chatbot system:

The two bots can autonomously engage in an interview with the goal of digesting the main points from the paper.
The user can jump into the conversation as well and ask interested questions.
Relevant texts for the generated answers are automatically highlighted in the original PDF.
The conversation history can be downloaded to the local folder.

We have successfully addressed all the user stories, good work 🎉 Now Sprint review is over, time for some retrospectives.

7. Sprint Retrospective

In this project, we focused on solving the problem of efficiently digesting complex research papers. Toward that end, we developed a dual-chatbot system where one bot plays the "journalist" while the other bot plays the "author", and two bots are engaged in an interview. In doing so, the journalist bot can act on behalf of the user and query the key points of the paper. This is beneficial as it eliminates the need for users to devise their own questions – an activity that can be challenging and time-consuming, particularly when dealing with unfamiliar subjects.

The success of the devised dual-chatbot approach relies critically on the journalist bot’s ability to steer the interview and generate insightful and relevant questions. In the current implementation, we used GPT-3.5-Turbo as the backbone LLM. To further enhance the user experience, it may be necessary to employ GPT-4 to boost the journalist bot’s reasoning capability.

What’s also important is that the journalist bot needs to be capable of interpreting and understanding the technical terms and concepts used in the broader research field to which the paper belongs. Besides using advanced LLM, fine-tuning the existing LLM on research papers of the target domain could be a promising strategy to pursue.

Looking ahead, there are several possibilities to extend our current project:

Better UI design. For simplicity, we have used Jupyter Notebook to showcase the main idea of the dual-chatbot system. We could certainly use more sophisticated libraries (e.g., Streamlit) to build a more user-friendly, engaging UI.
Multimodal capability. For example, text-to-speech (TTS) techniques can be used to create audio over the generated script. This could be beneficial to users as they can keep consuming the content during a commute, exercising, or other activities where reading isn’t convenient.
Accessing external databases. It would be great if the dual-chatbot system could have access to larger external repositories of research papers, such that the author bot could offer comparison analysis with respect to the latest developments in the fields of interest, thereby synthesizing insights across multiple papers.
Generating literature review. Since the generated interview scripts can serve as condensed yet richer (than paper abstracts) versions of the full papers, we could first accumulate the scripts for a variety of papers in a specific research field, and then request a separate LLM to generate comprehensive reviews of that field, based on analyzing the accumulated interview scripts. This feature would be especially valuable for researchers when they are initiating a new research project or a literature review paper.

What a fruitful Sprint we had! If you find my content useful, you could buy me a coffee here 🤗 Thank you very much for your support! As always, you can find the companion notebook with full code here 💻 Looking forward to sharing with you more exciting LLM projects. Stay tuned!

Developing an Autonomous Dual-Chatbot System for Research Paper Digesting

Table of Content

1. Concept: dual-chatbot system

2. Sprint Planning: what we want to build

3. Feature 1: Document Embedding Engine

4. Feature 2: Dual-Chatbot System

4.1 Abstract chatbot class

4.2 Journalist chatbot class

4.3 Author bot class

4.4 Quick test: the interview

5. Feature 3: User Interaction

5.1 Creating the chat environment (in Jupyter Notebook)

5.2 Implementing PDF highlighting functionality

5.3 Allowing user input for questions

5.4 Allowing downloading the generated script

6. Sprint Review: show the demo!

7. Sprint Retrospective

Related Articles

Check Your Biases

Evaluating Cinematic Dialogue - Which syntactic and semantic features are predictive of genre?

Large Language Models, MirrorBERT - Transforming Models into Universal Lexical and Sentence…

Using NLP and Text Analytics to Cluster Political Texts

Linguistic Fingerprinting with Python

When Language Meets Data

Natural Language Processing Is about More than Chatbots