Abubakar Aliyu

Data Scientist & ML Engineer

I am driven by curiosity and the challenge of turning bold ideas into working AI systems

📍 Paris, France

explore my portfolio

Quick Navigation

Click any metric to jump to that section

Years Experience

Core Expertise Areas

Featured Technical Projects

Degrees

Certifications

Awards & Honors

Spoken Languages

💬

Are we a match?

Experience

Building AI solutions that make a difference

AI Fellow

Centre for Journalism Innovation and Development

Jun 2024 - Nov 2024

Co-developed ChatJourno, a retrieval-augmented fact-checking assistant for West African journalists. Built scraping pipelines using BeautifulSoup and curated datasets from 50+ trusted news sources. Successfully deployed the system to production at chatjourno.com, enabling real-time fact-checking support for newsroom operations.

Data Science Project Manager

Ibsonka Energy Resources

Jan 2024 - Jul 2024

Led rollout of Digital Health Space (DHS) EMR across two hospitals. Managed cross-functional teams to standardize healthcare data migration and trained over 100 staff using Agile methodologies. Reduced support issues by 70% through comprehensive change management and technical documentation.

Featured Projects

From research to production-ready systems

AI Research & Development

LATEST Retrieval-Augmented Reasoning with Self-Correction for Biomedical QA (2025)

▼

Type: Master's Thesis

Goal: Eliminate AI hallucinations in biomedical question answering by architecting retrieval-augmented reasoning with self-correction mechanisms

Role & Team: Project Lead (4 members) - orchestrated comprehensive literature review, engineered LLM finetuning pipeline with quantization, architected Redis/FAISS integration, deployed production-ready HIPAA-compliant system

Tools & Methods: Llama-3.1-8B, SciBERT, Flan-T5, FAISS, Redis, Hugging Face Inference Endpoints, PyTorch, synthetic data generation, LLM-as-a-Judge evaluation

Outcome & Impact: Deployed production-ready system achieving 87% error detection accuracy and 0.82 semantic preservation F1-score, demonstrated effective approach to reducing hallucinations in biomedical QA

87% error detection accuracy 0.82 F1 score HIPAA compliant

GitHub Repository Watch Demo (Full Screen) Dataset on HF

Domain Name Generation Model with Systematic Quality Improvement (2025)

▼

Type: AI Research Project

Goal: Fine-tune Llama-3.1-8B to generate creative, brandable domain names while maintaining safety through systematic data curation and evaluation

Role & Team: Individual researcher - created hybrid dataset generation pipeline, implemented systematic quality assessment framework, conducted iterative model improvements through controlled experiments

Tools & Methods: Llama-3.1-8B-Instruct, LoRA fine-tuning, PyTorch, Hugging Face Transformers, Weights & Biases, Claude API, custom model versioning with SHA256 hashing

Outcome & Impact: Achieved 99.1% success rate, 100% safety compliance on test set (334 examples). Improved model quality through systematic dataset curation (reduced generic patterns from 38% to 28%). Tracked experiments in W&B comparing baseline vs augmented models. Implemented model versioning with SHA256 dataset hashing for reproducibility. Deployed live demo with creative domain generation capabilities

99.1% success rate 100% safety compliance 38% → 28% generic reduction 82% extended safety coverage

GitHub Repository Live Demo Model on HF Dataset on HF

Production ML Systems

ChatJourno (2024)

▼

Type: Fellowship Project (Centre for Journalism Innovation and Development)

Problem & Goal: In West African newsrooms, journalists face a 15-minute window to update breaking news with credible backstories. Under this pressure, research often comes from unreliable sources. The project aimed to empower journalists with an AI-assisted fact-checking system.

Role & Team: AI Fellow – collaborated with journalists and technologists to design and prototype ChatJourno. Built data collection pipelines, engineered the RAG system architecture, and prototyped the chatbot interface.

Tools & Methods: LangChain, BeautifulSoup, Python, Streamlit, RAG pipeline design, dataset curation from trusted sources

Outcome & Impact: Delivered a functional chatbot that reduces newsroom research time and improves reporting accuracy. Presented ChatJourno at the 2024 Media Innovation West Africa Conference, showcasing responsible AI integration in journalism.

Visit ChatJourno

JB Link Customer Churn Prediction System (2025)

▼

Type: Academic Team Project

Goal: Build a production-ready ML system for JB Link telecom to predict customer churn and reduce the 43% quarterly churn rate

Role & Team: Team Lead (5 members) - architected microservices system, implemented FastAPI model serving, designed automated data pipelines with Airflow

Tools & Methods: Docker, FastAPI, Streamlit, PostgreSQL, Apache Airflow, Grafana, scikit-learn, microservices architecture

Outcome & Impact: Deployed enterprise-grade system with batch prediction optimization, automated workflows, real-time monitoring dashboards

GitHub Repository

Data Engineering & Cloud

Serverless AWS Data Pipeline - NYC Taxi Data (2025)

▼

Type: Academic/Personal Project

Goal: Build a fully event-driven, scalable pipeline for processing and analyzing large volumes of transportation data

Role & Team: Individual contributor - architected event-driven ETL pipeline, implemented PySpark transformations, configured fine-grained IAM permissions

Tools & Methods: AWS Lambda, AWS EMR, AWS Glue, AWS Athena, AWS CloudFormation, PySpark

1.4M+ records processed Serverless architecture

Outcome & Impact: Processed 1.4M+ records with secure IAM policies, encryption, and automated orchestration

GitHub Repository

Scalable AWS Analytics System - Student Performance Analytics (2025)

▼

Type: Academic/Personal Project

Goal: Build a fault-tolerant, scalable platform for analyzing student performance data with interactive visualizations

Role & Team: Individual contributor - designed AWS architecture with custom VPC, implemented Streamlit dashboard, configured Auto Scaling Groups

Tools & Methods: AWS (VPC, EC2, ALB, Auto Scaling, S3, IAM), Streamlit, Python data science stack

2-4 auto-scaling EC2 instances 70% CPU threshold scaling

Outcome & Impact: Deployed production-ready system with high availability and comprehensive analytics dashboard

GitHub Repository

Computer Vision & Generative AI

Stable Diffusion Parameter Optimization Study (2025)

▼

Type: Academic Project

Goal: Conduct systematic analysis of Stable Diffusion parameters to understand trade-offs between generation quality, speed, and prompt adherence

Role & Team: Individual researcher - designed controlled experiments across multiple parameter dimensions

Tools & Methods: Stable Diffusion, Google Colab, multiple schedulers (DPM, EulerA, DDIM, LMS)

Outcome & Impact: Identified optimal parameter combinations (CFG 15.0, 30-40 steps, EulerA scheduler), established best practices for architectural image generation

GitHub Repository

MNIST GAN - Generative Adversarial Network (2025)

▼

Type: Academic Project

Goal: Implement a Generative Adversarial Network to generate realistic handwritten digits from random noise vectors

Role & Team: Individual contributor - implemented generator/discriminator architectures, conducted adversarial training

Tools & Methods: TensorFlow/Keras, GANs, Conv2D/Conv2DTranspose layers, BatchNormalization

Outcome & Impact: Successfully generated high-quality MNIST digits, achieved balanced discriminator/generator loss

GitHub Repository

Object Detection and Instance Segmentation (2025)

▼

Type: Academic Project

Goal: Develop object detection and segmentation systems for trash classification and plant disease diagnosis

Role & Team: Individual contributor - built end-to-end pipelines, conducted systematic model tuning

Tools & Methods: Ultralytics YOLOv8, PyTorch, evaluation frameworks

85.9% mAP on trash detection 36.6% mask mAP on plant segmentation

GitHub Repository

Technical Skills

Core expertise and technical capabilities

Core Expertise

Applied AI Research & System Evaluation

Production ML Deployment

Data Engineering & Cloud Architecture (AWS)

Technical Stack

Languages

Python Python, R, SQL SQL

ML/DL Frameworks

PyTorch PyTorch, Transformers, Hugging Face, TensorFlow

LLM Experimentation & Evaluation

LLM Fine-tuning (LoRA/QLoRA), Experiment Tracking (Weights & Biases), Model Evaluation (Metrics, Comparative Analysis, LLM-as-a-Judge)

MLOps & Tools

Docker Docker, FastAPI, Airflow Airflow, CI/CD, Grafana, AWS

Data & Databases

PySpark, PostgreSQL, Redis Redis, FAISS, Synthetic Data Generation

Education

Academic foundation in AI and engineering

Master of Science in Computer Science (Data Science and Analytics)

EPITA School of Engineering and Computer Science – Paris, France

Sept 2024 - Apr 2026 (Expected)

Research Focus: AI Safety and Hallucination Mitigation in Healthcare Applications

Thesis: Retrieval Augmented Reasoning with Self-Correction for Biomedical Question Answering. Focus on LLM fine-tuning, retrieval systems with FAISS and Redis, synthetic data generation, and deployment on Hugging Face Inference Endpoints.

Bachelor of Engineering in Computer Engineering

Ahmadu Bello University – Kaduna, Nigeria

February 2022

Grade: Second Class Upper (14/20)

Research Focus: Network Systems and Routing Protocols in Intermittent Connectivity Environments

Thesis: Enhanced PRoPHET Routing Protocol with Buffer Management Techniques in DTN. Focus on implementing and analyzing four buffer management techniques (MOFO, DLA, DL, FIFO) with PRoPHET routing protocol in Delay Tolerant Networks using ONE Simulator.

Certifications

Continuous learning and professional development

Google Certified Data Analyst

2023

Completed

Comprehensive certification covering data collection, transformation, analysis, visualization, and data-driven decision making.

Verify on Coursera

Google Certified Project Manager

2023

Completed

Professional certification in project management fundamentals, including planning, execution, and stakeholder management.

Verify on Coursera

AWS Certified Machine Learning Engineer - Associate

2025

In Progress

Preparing for certification covering ML data preparation, model training and deployment, MLOps workflows, and infrastructure management on AWS.

Awards & Recognition

Achievements and competitive selections

AI-in-Journalism Fellow

Centre for Journalism Innovation and Development

Jun 2024 - Nov 2024

Selected from over 500 applications across Nigeria and Ghana for the first AI-in-Journalism Fellowship in Africa. Pitched and prototyped ChatJourno, a fact-checking assistant using LangChain-based RAG pipelines and web scraping from credible West African news sources. Presented ChatJourno at the 2024 Media Innovation West Africa Conference, showcasing responsible AI integration in journalism.

Petroleum Technology Development Fund (PTDF) Overseas Scholarship Award

EPITA: École d'Ingénieurs en Informatique

Sep 2024

Awarded highly competitive postgraduate scholarship by the Nigerian government to pursue advanced studies abroad. Selection based on academic excellence and commitment to contributing to Nigeria's energy sector.

Summer Institute in Computational Social Science (SICSS-Calabar)

Duke University, Social Science Research Council, Mathematica

September 18 - 29, 2023

Selected for participation in SICSS-Calabar, one of only two SICSS locations in Africa. Covered advanced modules in data modeling with R, text analysis, web scraping, and social science predictive analytics.

TotalEnergies/NNPC National Merit Scholarship Award

Ahmadu Bello University

Mar 2018

Awarded by NNPC, TotalEnergies, and partners to recognize academic excellence. Granted to high-performing students through a highly selective annual process aimed at fostering talent development.

Languages

International communication capabilities

English

Bilingual Proficiency

French

Elementary (A2)

💼Currently seeking: End-of-study internship or Data Scientist / ML Engineer positions (CDD/CDI)

We're a Match

🎯

Let's talk! I'm passionate about building scalable, impactful solutions at the crossroads of AI/ML and system design. I thrive on tackling challenging problems, collaborating with diverse teams, and turning technical ideas into products that make a difference. My philosophy is simple: build quickly, learn continuously, and always keep user impact at the center.

Download CV Email WhatsApp Instagram