In this blog, I will share another of my journey of learning LLMs through hands-on practice. This is the latest entry in my series where I explore the power of AI and how it can be leveraged to streamline workflows and increase productivity.
The Idea of “Working on the System”
As someone always looking to optimize my workflow and increase productivity, I recently came across the idea of “working on the system.” The concept is simple: if a task requires multiple steps, invest time upfront to build a system that automates those steps, ultimately saving time and effort in the long run.
Optimizing My Content Curation Process
One area ripe for optimization was how I handled the wealth of valuable content I discovered on LinkedIn. Often, when scrolling through my feed on my mobile device, I’d come across prompts, research references, or tool recommendations that I wanted to revisit later. My typical process involved taking a screenshot and then locating those images in my gallery when I had time to work on them.
Leveraging AI to Extract Text from Images
That’s when I learned that ChatGPT-4o, an AI assistant, could extract text from images with a simple prompt. Instead of manually copying and pasting content from screenshots, I could leverage this capability to streamline the process.
Automating the Tedious Process
As my knowledge of LLM use cases grew, I found myself taking more and more screenshots, leading to an increasing amount of work to convert images to text. It was then that I decided to create a custom app to automate this tedious task, applying what I had learned about building systems and creating custom GPTs.
The Surprisingly Simple Process
The process turned out to be remarkably straightforward. If you can do simple prompts, you can build simple custom GPTs. By navigating to the “Explore GPTs” section on ChatGPT-4o and clicking “Create,” the screen splits into two halves. On the left is a regular chat interface, and on the right is the demo version you’re creating. You can start answering questions on the left, and the system will automatically build the application based on your responses, even generating a logo image and incorporating any other requested elements.
My Custom Image-to-Text Converter
For my use case, I wanted a simple image-to-text converter that would process all uploaded images, display the results, and provide a downloadable file for the user. With a few settings and some basic backend configuration (which is surprisingly straightforward), I created my very own personal custom API.
How It Works
The process works as follows: When you initiate the custom GPT, it asks how many images you want to convert. Since there’s a limit to the number of files you can upload at once, it guides you through the upload process. Once all images are loaded, it prompts you with three questions about the type of data you’re looking for, the desired output format (e.g., Word, Excel, or another option), and which file you’d like to receive the output in.The GPT then processes the images and provides you with a downloadable file containing the extracted text.
Accessibility for Free Users
While a paid OpenAI account is required to access this feature, even free users can utilize it to a limited extent with the latest update. Here is the link to my custom GPT
Increasing Efficiency Through Automation
Building this custom GPT has significantly streamlined my workflow, allowing me to effortlessly extract valuable information from images and incorporate it into my projects. By investing time upfront to create a system, I’ve increased my overall efficiency and productivity – a valuable lesson in my ongoing journey of learning AI by doing.
Conclusion and Future Plans
In this blog, I have shared my experience of creating a custom GPT app using ChatGPT-4o and how it has improved my workflow. I will continue to explore AI and LLMs, applying what I learn to optimize my workflow and increase productivity. Its not hard at all and why don’t you try this too ?