Article

AI Code & Coffee - Využitie modelu Gemini

Zistite, ako prirodzená potreba automatizovať rutinnú prácu viedla k vývoju inovatívnych AI nástrojov, ktoré kolegom šetria stovky hodín času. V tomto texte odhaľujeme zákulisie ich vývoja, kľúčovú rolu multimodálneho modelu Gemini aj to, prečo je pri umelej inteligencii najdôležitejší správny dizajn promptov.

Article

AI Code & Coffee - Využitie modelu Gemini

Zistite, ako prirodzená potreba automatizovať rutinnú prácu viedla k vývoju inovatívnych AI nástrojov, ktoré kolegom šetria stovky hodín času. V tomto texte odhaľujeme zákulisie ich vývoja, kľúčovú rolu multimodálneho modelu Gemini aj to, prečo je pri umelej inteligencii najdôležitejší správny dizajn promptov.
AI Code & Coffee - Využitie modelu Gemini

Article

AI Code & Coffee - Využitie modelu Gemini

Zistite, ako prirodzená potreba automatizovať rutinnú prácu viedla k vývoju inovatívnych AI nástrojov, ktoré kolegom šetria stovky hodín času. V tomto texte odhaľujeme zákulisie ich vývoja, kľúčovú rolu multimodálneho modelu Gemini aj to, prečo je pri umelej inteligencii najdôležitejší správny dizajn promptov.

Inovácie a vývoj, AI

Article

AI Code & Coffee - Využitie modelu Gemini

Zistite, ako prirodzená potreba automatizovať rutinnú prácu viedla k vývoju inovatívnych AI nástrojov, ktoré kolegom šetria stovky hodín času. V tomto texte odhaľujeme zákulisie ich vývoja, kľúčovú rolu multimodálneho modelu Gemini aj to, prečo je pri umelej inteligencii najdôležitejší správny dizajn promptov.

Inovácie a vývoj, AI

What was the original problem or need that led to the creation of Vismo Scraper and Value Proposition Analyzer?

The need emerged very naturally from colleagues in other teams. On a daily basis, they were doing highly repetitive and time-consuming work such as manually browsing websites, reading large amounts of text, comparing competitors, validating product information, and repeatedly searching for answers to the same questions.

They started asking whether at least part of this work could be done faster or automated.

That was the moment when we thought: what if we let AI read websites for us?

This was followed by weeks of research into what was already available on the market—what models existed, which libraries and tools could be used, what made sense to build from scratch, and what already existed. Since the AI ecosystem evolves extremely fast, a big part of the work was making sure we weren't reinventing something that could already be reasonably reused.

The Value Proposition Analyzer came later as a natural extension of Vismo Scraper. At that point, we already had a solid technical foundation, know-how, and user feedback, which gave us a clear starting point.

Why did you choose Gemini?

During development, we tested several models, including GPT-4, Claude 3, open-source Mistral, and others. Each had its strengths, but Gemini turned out to be the best option in terms of speed, cost per prompt, and output quality.

An important factor was also multimodality-the ability to work not only with text but also with images. We found that on many websites, key information is hidden in graphics, diagrams, or hero sections. Gemini was able to understand these elements in the context of the entire page, not just "read" text.

At the same time, it became clear that Gemini has been evolving very rapidly over the past year, and its integration via Vertex AI was the cleanest and most reliable choice for our environment. Looking back, we're confident it was the right decision.

What did the first prototype look like? What worked immediately and what didn’t work as expected?

The first prototype focused exclusively on Vismo Scraper. It was a simple web application where users could:

  • choose an AI model
  • enter URL addresses
  • ask their own questions

The core functionality-HTML scraping and answering questions—worked relatively quickly. However, we soon ran into model limitations, especially token limits. Sending the entire website content without filtering simply wasn't feasible.

This forced us to introduce logic to select only relevant parts of the HTML based on user questions. In the early phase, we experimented with more manual approaches, but later we introduced text reranking using Jina Reranker in the Scraper.

For the Value Proposition Analyzer, we took a different approach. Instead of "blindly" scraping entire websites, we used Google Custom Search API to target only relevant subpages (e.g. About, Solutions, Why Us).

Based on user feedback, we also added a spreadsheet version, which users explicitly requested.

How long did it take to move from proof of concept to real usage within teams?

Surprisingly fast. The first usable version was available within a few weeks.

A large portion of the time was not spent on coding itself, but on research and understanding what already existed, what made sense to reuse, and what to avoid.

Users started actively using Vismo Scraper after roughly one month, both in the web and spreadsheet versions. The rollout of the Value Proposition Analyzer followed a similar timeline, as it was built on the solid foundation created by Vismo Scraper.

Do you have a concrete example where the AI tool significantly saved time or changed colleagues’ workflows?

Yes, this is actually the core value of both projects.

If a user processes 250 URLs per month and asks 15 questions per URL, doing this manually would take around 130 hours per month, including reading websites, searching for answers, and comparing information.

With Vismo Scraper and the Analyzer:

  • dozens of URLs are processed at once
  • questions are asked directly to AI
  • users receive structured answers, relevance scores, and visualizations

Colleagues no longer need to manually read websites. AI does that for them, and they can focus purely on interpreting results and making decisions

What was the biggest technical challenge during development and how did you solve it?

The biggest challenge was controlling the model input, specifically the amount of text, images, and tokens.

After introducing image scraping, we had to:

  • filter images by type and size
  • remove icons, duplicates, and irrelevant visuals
  • limit crawling to the first level, since we discovered that the most important content is usually there

The key solution for Vismo Scraper was Jina Reranker, which allowed us to send only truly relevant content to Gemini.

Was there a moment when the model produced incorrect or potentially dangerous results? How did you handle it?

Not dangerous per se, but incorrect or inconsistent results did occur, especially early on.

From the start, we treated AI as not 100% reliable and never as a single source of truth.

The most important part was prompt design. With AI, you don't get what you mean - you get only what you prompt.

Based on incorrect outputs, we refined the prompt systematically over time.

You can think of prompt design like sending someone else grocery shopping - if you just say "buy something for dinner", you'll probably be disappointed. The clearer the instructions, the better the result.

How did prompt tuning or logic refinement look - more experimentation or a systematic process?

It was a combination of both.

First, logically - we defined a clear response format (Answer / Comment), context, and expectations.

Then systematically - based on real outputs, errors, and feedback, we iteratively refined the prompt.

Prompt tuning was one of the most demanding but also one of the most critical parts of the entire project.

How did early users react? Did you have to change the tool based on feedback?

Yes, significantly.

The first major feedback was a request for the simplest possible interface, which led to the creation of the spreadsheet version.

Later, we refined visual outputs, adding charts, percentages, and traffic-light (semaphore) indicators.

Based on user agreements, we also added image scraping, since many important answers turned out to be hidden in visuals.

What trade - offs did you have to make between output quality, speed, and cost?

Output quality was always the top priority. Without it, the tool wouldn't make sense.

Speed came second, and cost was considered last.

Along the way, we also learned that the newest Gemini model is not automatically the best choice. What really matters is selecting the right model for the specific use case - balancing context size, multimodality, response consistency, and latency, rather than chasing the latest release.

If you were starting from scratch today, what would you do differently architecturally or process - wise?

Many things 😊

A lot of what we had to build a year and a half ago now exists off-the-shelf.

A good example is the Value Proposition Analyzer, where we already relied more on existing tools such as Google Custom Search and improved reranking approaches instead of custom-built selection logic.

Today, I would also look more closely at low-code or orchestration tools (e.g. n8n). They offer fast time-to-value, but also have limitations in terms of control and custom logic.

For complex and scalable solutions, custom architecture still makes sense, but combining it with existing tools would be a much stronger approach today.

Súvisiaci obsah