🐍

🧠

🔥

🧭

☁️

🐳

🧱

🔗

⚙️

📈

Amine Bousmah

Data & AI Engineer

I turn data into decisions. I shape messy data into simple answers, work closely with teams, and ship things that truly help people. 🤝🚀

PythonSQLAirflowSparkCloud (GCP / AWS / Azure)DatabricksdbtSnowflakeBigQueryTensorFlowPyTorchscikit-learnMLflowDockerGitCI/CDAPIJiraPower BITableauKafkaC#Dataiku DSS

📄 Download my CV 🚀 View my work

🐙💼✉️

Scroll to explore

Technical Skills

🛠️

Data Engineering & Analytics

Foundations for reliable data systems.

ETL/ELT with Python & SQL; orchestration basics (dbt/Airflow).

Star/snowflake modeling and essential warehousing patterns.

Data validation and tests with clear SLAs/expectations.

Query optimization fundamentals and cost-aware thinking.

Small streaming prototypes (Kafka) + batch/stream joins.

🤖

Machine Learning & Modeling

Pragmatic ML with strong evaluation discipline.

Solid baselines (linear, tree-based) before complex models.

Time-series forecasting (ARIMA/Prophet, boosting) when useful.

Feature pipelines with leakage-safe cross-validation.

Explainability (SHAP/feature importances) and readable model cards.

Experiment tracking (MLflow) and packaging models for APIs.

📊

BI & Data Visualization

Make results clear, trusted, and actionable.

Defined KPIs and a simple semantic/metric layer.

Interactive dashboards with filters and drill-through.

Data storytelling: annotations, small multiples, clear legends.

Row-level security basics and governance-ready layouts.

Scheduled refreshes, exports, and lightweight QA checks.

🧩

Application Design & API

Product-minded developer focused on clean, secure services.

DDD-lite: clear module boundaries and dependency rules.

REST APIs (FastAPI/Flask) with typed schemas and OpenAPI.

Auth (JWT/OAuth2), input validation, and robust error handling.

Background jobs (Celery/RQ), file ingestion, async I/O.

Frontend integration with React/Next and reusable UI patterns.

☁️

Cloud & DevOps

Ship small, observe, and iterate.

Containerized dev with Docker; reproducible environments.

CI/CD (GitHub Actions): tests, linting, type checks.

Deploy on Vercel/Cloud Run; env & secrets management.

Basic monitoring (logs/metrics/traces) and alerting.

Cost awareness and usage-based scaling (serverless first).

💹

Data Finance & Revenue Analytics

Applied analytics for markets, risk, and growth.

Credit risk scorecards & PD estimation; calibration & backtesting.

Market analytics: returns/volatility, simple VaR/ES & stress tests.

Fraud/AML: anomaly detection and KYC/KYB entity matching.

Portfolio analytics: mean-variance, factor tilts, Black-Litterman basics.

Documentation & governance aligned with IFRS/Model Risk standards.

Selected Projects

Vinted Extension — Smart auto-repost to boost visibility

Browser extension that automatically republishes listings to leverage Vinted’s algorithmic boost. Features safe scheduling, anti-duplicate logic and local anti-tracking to maximize views and click-throughs without manual effort.

Key results

95%

automation

85%

time Saved

30%

view Uplift

Technical implementation

▹Chrome Extension (content + background service worker)
▹Task scheduler, de-duplication & cooldown management
▹Local headers/cookies handling; optional Express helper
▹Image helpers (crop/compress) when needed

Vinted Extension — Smart auto-repost to boost visibility

Tribara — Talent Matching Optimization

AI-powered recruitment optimization to automate candidate screening and ranking, integrated with ATS for seamless workflows. Delivered faster shortlists and more relevant matches for recruiters.

Key results

50%

screening Time Reduction

500

cv Volume

30%

relevance Gain

Technical implementation

▹Python ETL & ML pipeline (parsing + scoring)
▹NLP-based candidate ranking with continuous fine-tuning
▹ATS integration (webhooks/API) & scoring feedback loop
▹Dashboard & export for recruiter decision support

Face Recognition — Find all photos of a person

Application that lets you upload a few photos of yourself to automatically detect all occurrences within an event album (ideal for team building/seminars).

Key results

91.4%

detection A P

99.83%

lfw Accuracy

10000

index Size

Technical implementation

▹Face detection: RetinaFace/SCRFD (InsightFace)
▹Face embeddings: ArcFace (InsightFace, 512-d vectors)
▹Similarity search & scaling: FAISS (IVF/PQ or HNSW)
▹De-duplication & robustness: thresholding + DBSCAN; multi-reference averaging

Face Recognition — Find all photos of a person

11Field — Football analytics & scouting suite

End-to-end scouting toolkit: xG/xGA, role-based radars, league comparators, match reports and player similarity. Adds ML models for clustering and explainability to support recruitment decisions.

Key results

12%

leagues Covered

40%

modeled Features

60%

time To Insight

Technical implementation

▹Data ingestion from public football APIs (FBref/ESPN/ClubElo, etc.)
▹Interactive dashboards (Streamlit + Plotly)
▹PCA + KMeans for playing-style clusters
▹RandomForest + SHAP for explainable player ranking

11Field — Football analytics & scouting suite

Modern Data Capabilities

🔌

Ingestion & Connectivity

REST/GraphQL, webhooks, SaaS & DB connectors
Batch files (CSV/Parquet) + CDC/event streams
Secrets, retries, backoff, idempotency

🧭

Workflow Orchestration

Reproducible DAGs with clear SLAs
Idempotent tasks, alerts, backfills
Data-aware scheduling & dependency management

🗄️

Lakehouse Storage & Formats

Object storage + warehouse interoperability
Parquet/Delta/Iceberg, partitioning & compaction
Schema evolution, time travel & ACID tables

🏗️

Modeling & ELT

Layered models (staging → core → marts)
Data contracts & tests (quality as code)
SCD patterns, surrogate keys, audit columns

🧪

Data Quality & Observability

Freshness, completeness, accuracy monitors
Column-level lineage & impact analysis
Anomaly detection with playbooks/runbooks

📊

BI & Semantic Layer

Governed metrics/semantic layer for consistency
Row-level security & policy-based access
Drill-through dashboards, alerts & subscriptions

✨

Data Apps & UX

API-first apps (Next.js/React) with great UX
Accessible, fast, mobile-friendly interfaces
Shareable exports & decision-ready views

⚡

Realtime & Streaming

CDC & event-driven pipelines (micro-batch/stream)
Live dashboards via WebSockets/SSE
Materialized views & low-latency caches

🤖

ML & MLOps

Feature pipelines with reproducible training
Experiment tracking, registry & versioning
Drift/fairness monitoring & A/B evaluations

🧠

LLM & RAG

Embeddings & chunking with prompt versioning
Hybrid retrieval + guardrails & citations
Privacy-aware grounding on enterprise data

🧭

Vector Search

ANN indexes (HNSW, IVF-PQ) at scale
Hybrid keyword + vector retrieval
Deduplication & clustering for discovery

🔒

Governance, Privacy & Security

RBAC/ABAC, masking/tokenization of PII
Catalog, lineage, ownership & audit logs
Compliance by design (GDPR, ISO 27001)

💸

FinOps & Performance

Cost tags/budgets & storage lifecycle
Pruning, partition pushdown, caching
Autoscaling, SLAs/SLOs with clear error budgets

🚀

Data CI/CD & DevEx

Git-based reviews, tests & linters for data code
Reproducible builds & artifact versioning
Ephemeral preview envs & safe rollbacks

🔁

Interop & Internal APIs

OpenAPI/JSON Schema contracts & governance
Reverse ETL to operational tools
Pagination, rate limits & idempotent writes

Let's work together 🚀

Are you looking for a Data & AI profile capable of combining technical expertise with a human touch, someone who turns data into meaningful stories, builds solutions that matter, and works hand in hand with teams to create impact?

Paris, France

Contact me View my CV