Unlocking the Future of AI: My Journey with custom GPTs

Angshuman Gupta
3 min readJan 24, 2024

--

In the ever-evolving landscape of technology, I found myself drawn to the latest innovation reshaping our digital interactions: the Chat GPT Store. This fascinating world, brimming with cutting-edge developments, beckoned me to dive in and discover its potential. My first project, PDF GPT, emerged from this exploration, marking a significant step in my journey toward harnessing the power of Generative AI. Through this project, I aim to understand the capabilities of PDF GPT and share my learning experience, hoping to inspire others to integrate this revolutionary technology into their daily lives.

Generated through my custom GPT

The Vision Behind PDF GPT

Imagine a world where digital assistants not only read but also comprehend and summarize PDFs for you. That’s the essence of PDF GPT. Designed to be your document expert, PDF GPT promises to transform any uploaded PDF into a well of knowledge, answering your questions with precision. It operates with the utmost respect for your privacy, ensuring that your data remains confidential. Bear in mind, that PDF GPT is a learner, continually evolving to read PDFs more efficiently, including those shared via links.

Journey So Far

Milestone 1:

Firstly, I tried to build a basic app that can read and analyze the uploaded PDF. I tried to leverage prompt engineering/instructions to build a basic bot. Even though the bot was able to analyze uploaded PDFs it couldn’t read PDFs when I shared links.

Milestone 2:

Progressing to the next phase, I integrated the “actions” feature of GPT. I developed an API using FAST API and PyMuPDF, deployed on Vercel. This advancement empowered PDF GPT to interact with both uploaded and linked PDFs. However, its still not able to handle large PDFs as it response goes beyond the GPT’s context window.

Refer to the technical details:

API Implementation

JSON Schema for GPT Action

License and Policy

What Lies Ahead

The journey isn’t over yet! The next milestones include — Developing capabilities to handle larger PDF files. I plan do it by creating an GPT agent using OpenAPI’s GPT API and LangChain. This should be able to extract the information in chunks and add them to the GPT’s embeddings thereby helping me to overcome the current context window challenge.

Resources and Inspirations:

Conclusion

Embarking on this journey to create PDF GPT has been both challenging and rewarding. It’s a continuous process of learning, adapting, and innovating in the face of technological hurdles. As I venture further into enhancing the PDF GPT, I invite you to join this exciting exploration into AI-driven document interaction.

Feel the pulse of this technological advancement? Clap, share, and follow for more updates and insights into this journey!

If you like this, be sure to clap, share, and follow me — it means a lot to me!

Get in touch! LinkedIn| Instagram| Facebook

--

--