Small Vision Language Model

sellProject sellDFKI sellAugmented Vision

Small Vision Language Model

Topic

This project aims to distill knowledge from an existing large human-centric foundation model to create a (smaller) foundation model for multiple vision and language tasks involving humans​.

Tasks

  1. Run inference on multiple models to create a training dataset​
  2. Implement a distillation architecture​
  3. Train and evaluate model on one benchmark dataset​

Expected Skills

  1. PyTorch (required)​
  2. Strong programming skills (required)​
  3. Foundation Models, Knowledge Distillation, Contrastive Learning, Dataset Curation (preferred)​