Building a Q&A Bot with GPT

LangChain was designed specifically to create chatbots over specific documents. In this blog, we'll show you how to use it over a website.
By Lilly Chen

LLMs like GPT are trained broadly on the internet. This makes them suspectible to pulling information “out of nowhere”. LangChain is an open sourced way of combining GPT’s writing skills with existing document knowledge into a Q&A bot.

Getting Started

In this repo, there is a demo with a Notion DB as the document source. We’re going to slightly modify the intake source so that we can use this repo with a regular webpage instead of Notion.

After you configure your OPENAI_API_KEY, do not continue with the export and unzipping of Notion. Instead, locate the html of a webpage that you’d like to use. We’ll use pandoc and curl to pull down a webpage and convert it into markdown, similar to the Notion export. In notion-qa/NotionDB, run the following command:

curl --silent <html url> | pandoc --from html --to markdown_strict -o <name_of_file>.md

Here’s my example:

curl --silent | pandoc --from html --to markdown_strict -o

Check to see that your markdown file looks correct. Once it’s there, you can continue on with the repo’s instructions: python This process takes a few minutes, depending on the file.

Querying the results

Now you can use python "How do I get something CCSGA certified?" to ask the bot questions based on the handbook you ingested.

The result should look something like this:

(base) ➜  notion-qa git:(master) ✗ python "How do I get something CCSGA certified?"

Answer:  To get something CCSGA certified, groups must apply to become a recognized CCSGA student organization and be classified as “Active”. The CCSGA Student Life Committee will review the application and external advisors and adult volunteers must follow the guidelines set out in the Student Organization Handbook and complete an HR background check, complete Title IX training, and the Volunteer Agreement Form.

Sources: Notion_DB/

Cool next steps (if you’re feeling fancy)

From here, the world is your oyster! You could hook up a Twilio configuration so that you can text your bot instead of using the CLI. You could set up an Autocode integration into a Discord bot that allows anyone in your server to query a community knowledge base.

Are you doing something boring, manual, and necessary with your content? Let us know - we’d love to help.