T20 Cricket Score Prediction using a full RAG pipeline

 

I spent the last few weeks building something I've always wanted — a GenAI app that actually understands cricket. Not just "who's winning" — but why, and what to do about it. Here's what I built: T20 Cricket Score Prediction using a full RAG pipeline.


The stack:

  • Google Gemini 2.5 Flash as the LLM brain
  • Pinecone as the vector database for semantic search over match stats
  • sentence-transformers for local embeddings (free, no API cost)
  • CricAPI for live T20 data
  • Streamlit for the web UI + Docker for deployment


The interesting part? I gave it this prompt:

"Analyze the 2026 T20 World Cup Super 8 standings. Calculate the exact win/loss margins India needs to qualify over West Indies. What happens if Sunday's match gets washed out? And what does Pakistan need against Sri Lanka?"

It came back with NRR calculations, rain-rule implications, powerplay strategies for the trailing team — all backed by retrieved data, not hallucination.That's RAG working as intended. The model doesn't guess. It retrieves, then reasons.A few things I learned the hard way:

  • Free tier API quotas are aggressively small. Hit the ceiling fast
  • Pinecone renamed their Python package (classic)
  • The gap between "Claude.ai subscription" and "Anthropic API credits" is real and expensive





Full code is open source 👇

🔗 github.com/sudeepkrishnan87/cricket-prediction-genai

If you're exploring GenAI beyond chatbots — RAG, vector DBs, MCP — happy to share what I learned. Drop a comment or DM.

Comments

Popular posts from this blog

SOLID Principle (Quick Read)

Building a Smart Holiday Booking System with Agent-to-Agent Communication

Apache kafka using kraft