Hi, I'm

Andrea Alberti

Building intelligent systems with AI

Machine Learning

Multi-Agent Systems

LLM Models

Deep Learning

RAG Systems

Scroll to explore

About Me

Who I Am

Andrea Alberti

GenAI Engineer & Data Scientist

Graduated with a double degree in Management and Computer Science–Data Science, I have gained substantial experience in multidisciplinary projects. I specialize in applying machine learning, deep learning, and most recently Generative AI techniques to develop innovative and automated solutions.

My academic journey, from Management Engineering (110/110 cum laude) to a Master's in Data Science (110/110 cum laude), has given me a blend of technical expertise and strategic thinking. This allows me to approach complex problems from both a business and a technological perspective.

Professionally, I focus on building intelligent systems—from multi-agent architectures that automate complex business processes to advanced conversational AI—to drive efficiency and reduce operational costs. My goal is to continuously improve myself, using AI to create tangible value.

Download Full CV

2022 - 2024110/110 L

MSc. Data Science - Computer Engineering

University of Pavia

Download Thesis

2018 - 2021110/110 L

BSc. Management Engineering

University of Brescia

Download Thesis

Certification

Google Cloud Professional ML Engineer

Google Cloud

Verify

Technical Skills

Core Skills

RAG

Multi-Agent Systems

Prompt Engineering

Few-Shot Learning

Problem Solving

System Design

Data Analysis

Research

Public Speaking

Presentation Design

Workshop Creation

AI Tools

Claude Code

Codex

GitHub Copilot

Gemini

ChatGPT

Claude

NotebookLM

Gemini CLI

Cloud & Infrastructure

Google Cloud Platform

Vertex AI

Cloud Run

Cloud Functions

BigQuery

WebSocket

Agent Engine

Programming Languages

Python

SQL

HTML/CSS

Frameworks & Libraries

Google ADK

Dialogflow CX

Gemini SDK

Vertex AI Search

TensorFlow

PyTorch

Scikit-learn

Keras

XGBoost

FastAPI

Flask

PySpark

Hadoop

MongoDB

NetworkX

Pandas

NumPy

OpenCV

Matplotlib

Professional Work

Professional Projects

A selection of professional projects where I applied Generative AI and Machine Learning to solve real-world problems.

LiveFeatured

AI-Powered Tyre Selection Assistant

Refined and tested a Dialogflow CX conversational agent for the UK market, guiding users in tyre selection through a multi-agent system with RAG and external APIs. Successfully migrated the solution to Google Agent Development Kit (ADK), expanding to international markets and new vehicle categories with cross-cloud AWS-GCP integration.

Dialogflow CXGoogle ADKMulti-AgentRAG

Multi

Markets

Cross-Cloud

Architecture

Expanded

Automated Document Validation System

Full-stack implementation of a multi-agent system to automate verification and validation of documents for public funding requests. Developed backend in Python and frontend in HTML, CSS, and JavaScript. Designed a modular and flexible architecture that allows agent behavior adaptation without code modification.

Multi-AgentPythonProcess AutomationGCP

90%

Time Reduction

Modular

Architecture

Full

Stack

Completed

Advanced Knowledge Base Chatbot

Implementation of advanced chatbot agent architecture to answer user questions using a knowledge base of web pages and PDF documents. System designed with a main orchestrator agent routing requests to five specialized sub-agents. Developed data management pipeline including PDF parsing, chunking strategy, and security layer for inappropriate questions.

Dialogflow CXGCPMulti-AgentRAG

Sub-Agents

Multi

Domains

PDF+Web

Documents

Completed

Real-Time Multimodal Agent

Development and implementation of real-time multimodal conversational agent based on Google Agent Development Kit (ADK). System capable of processing audio and video inputs simultaneously, sustaining fluid conversations, and autonomously performing browser operations through reasoning and tools. Asynchronous dual-server architecture with WebSocket communication for low-latency bidirectional streaming.

Google ADKGemini Live APIMultimodal AIBrowser Automation

<500ms

Latency

A+V

Modalities

Async

Architecture

Completed

Luxury Yacht Virtual Assistant

Implementation of virtual assistant with aim of answering user questions based on corporate knowledge base. Configured as RAG (Retrieval-Augmented Generation) application, leveraging Google Cloud ADK and Vertex AI Search to retrieve relevant information from dedicated datastore. Architecture uses Gemini models to process requests and generate accurate, contextualized responses in Italian.

RAGVertex AI SearchGoogle ADKGCP

Gemini

LLM

95%

Accuracy

RAG

Technology

Completed

Multi-Agent Ticketing System

Implementation of multi-agent ticketing system to automate user responses. Project developed as Proof of Concept (POC), involving creation of specialized agents, each capable of interacting with external databases via APIs to provide accurate and comprehensive replies. Multi-agent architecture ensures user requests are routed to most competent agent, reducing staff workload and management costs.

Multi-AgentGoogle ADKAPI IntegrationAutomation

-70%

Workload

Multi

Agents

POC

Type

Completed

LLM-Based Email Classification Pipeline

Within broader project aimed at automating manual process and reducing operational costs, contribution focused on creating LLM-based pipeline for classification and dispatch of certified emails (PEC). Key activities included prompt engineering and few-shot learning to refine model outputs, along with development of metrics and analytical tools for system performance evaluation.

LLMPrompt EngineeringFew-Shot LearningProcess Automation

High

Cost Reduction

Few-Shot

Technique

85%

Automation

Completed

Insurance Liquidation Engine - Demo Project

Developed comprehensive multi-agent collaborative architecture for internal Sharing Days demonstration. System automatically analyzes, processes, and issues liquidation judgments on insurance claims. Architecture combines parallel agents for document analysis with sequential agents for final evaluation, utilizing custom tools, built-in Google Cloud services, and MCP integration.

Google ADKMulti-AgentCloud RunMCP

Agent Types

Hybrid

Architecture

Demo

Purpose

Academic Research

Research & Academic Work

8 papers

80+ pages

6 cum laude

Machine Learning

Heart Disease Detection from Audio Signals

Advanced Biomedical Machine Learning

Designed prevention and clinical support ensembles for early cardiac screening on the Dangerous Heartbeat Dataset (CHSC2011). Heart sounds were resampled at 4 kHz, segmented into 1-second windows, described with MFCC, chroma, spectral and temporal descriptors, and reduced from 338 to 41 features via Spearman-based filters. The prevention ensemble keeps false normals under control (ROC-AUC 0.96, TPR 43.4% at 1% FPR) while the five-class support ensemble delivers macro F1 81.6 with per-class risk analysis and SHAP explanations.

Key Results

0.82

F1-Score

0.96

ROC-AUC

43.4%

TPR @1% FPR

74.3%

TPR @5% FPR

86.6%

TPR @10% FPR

Jul 2024

17 pages

Scikit-learnTorchaudioLibrosa

View DetailsCode

Graph ML

Disease Prediction with Graph Machine Learning

Financial Data Science

Mapped 773 diseases and 377 symptoms into a bipartite network to engineer graph-aware features for diagnosis. Method of Reflections, Disease/Symptom Influence indices, community detection and betweenness centrality drive new descriptors that complement one-hot symptoms. Logistic Regression, Random Forest and MLP models were benchmarked; the best logistic model matches the symptom-only baseline while using fewer inputs and exposes class-level accuracy insights.

Key Results

Logistic Regression

Best Model

28%

Feature Reduction

1.5%

Accuracy Drop

Dec 2023

23 pages

PythonNetworkXScikit-learn

View DetailsCode

Big Data

Review Helpfulness Prediction with Big Data

Data Science & Big Data Analytics

Analyzed ~3M Amazon book reviews end-to-end with a big data stack (HDFS, Spark, MongoDB) to explain and predict perceived helpfulness. Hypothesis testing quantified the role of review length, sentiment and star rating, while Word2Vec embeddings fed Random Forest, SVR and MLP regressors for score prediction. The best Random Forest model achieved MSE 0.0259 (RMSE 0.1609, R² 0.253).

Key Results

Random Forest

Best Model

0.026

MSE

0.25

R²

Sep 2023

6 pages

Hadoop HDFSApache SparkPySpark

View DetailsCode

NLP

Clickbait Detection in News Headlines

Machine Learning

Benchmarked Multinomial Naive Bayes and Logistic Regression on 32k balanced news headlines to detect clickbait. Two deployment targets were explored: maximum accuracy (97.12% test accuracy with stopwords and 8k vocabulary) and zero false positives (0% FPR, 84% accuracy, TPR 68%). Detailed error analysis highlights impactful tokens and the trade-offs introduced by bias calibration.

Key Results

97.12%

Accuracy

0.0%

Best FPR

84.00%

FPR Accuracy

8000 words

Vocabulary

Feb 2024

6 pages

PythonScikit-learnNumPy

View DetailsCode

Security

DDoS Attack Detection and Mitigation

Enterprise Digital Infrastructure

Recreated DNS reflection and amplification attacks in a controlled LAN to measure amplification factors, target-side latency and server resource usage. Custom Scapy scripts spoofed victims while varying query types (A, MX, NS, ANY) at 10k–50k packets/s. The study documents latency spikes above 100 ms for ANY requests, CPU saturation during amplified attacks and evaluates mitigation strategies.

Key Results

1.46

AF A

4.14

AF MX

4.46

AF NS

Jan 2024

11 pages

PythonScapyWireshark

View DetailsCode

Machine Learning

Cake Classification Features Analysis

Machine Learning

Compared handcrafted descriptors and CNN-derived features for classifying 15 cake categories (1,800 images). Low-level statistics (color histogram, edge direction, co-occurrence) fed an MLP but plateaued at 31% accuracy, while PVMLNet feature maps (layer −5) coupled with an MLP achieved 90% test accuracy. Transfer learning by fine-tuning PVMLNet reached 80%, highlighting the importance of deep representations.

Key Results

90%

Accuracy

90%

Neural Features

31%

Low-Level Features

80%

Transfer Learning

Feb 2024

5 pages

Pythonpvml libraryPVMLNet

View DetailsCode

Computer Vision

Vanishing Points Detection in Images

Computer Vision

Delivered two computer-vision utilities: (1) a histogram-driven binarisation tool with auto/manual tuning and GUI, and (2) a vanishing point detector that chains Canny, probabilistic Hough and RANSAC (500 iterations, 5 px tolerance). The pipeline adapts thresholds from image statistics, overlays the 15 most significant lines, and documents SSIM comparisons against Otsu.

Key Results

500

RANSAC Iterations

5 pixels

Threshold

10 longest per run

Hough Lines

Mar 2024

7 pages

Python 3.9+OpenCVNumPy

View DetailsCode

NLP

Sentiment Analysis on Social Media

Machine Learning

Implemented sentiment classifiers on the IMDb dataset (50k reviews) comparing Multinomial Naive Bayes and Logistic Regression. Vocabulary size, stopword removal and stemming were studied to balance accuracy and overfitting. Naive Bayes with stopwords (vocab 1k) achieved 82.6% test accuracy, while Logistic Regression reached 85.4% with minimal tuning.

Key Results

85.4%

Accuracy

82.6%

MNB Accuracy

85.4%

LR Accuracy

1000 words

Best Vocabulary

Feb 2024

5 pages

PythonScikit-learnNLTK