Student Researcher @ Google DeepMind · PhD Candidate @ UvA

Michael Dorkenwald

Teaching AI models to understand and learn from videos.

I'm a Student Researcher at Google DeepMind and a PhD candidate at the University of Amsterdam, supervised by Yuki Asano and Cees Snoek, as part of the ELLIS PhD program. Video shows how the world changes over time. I work on models that can learn this without supervision, connect it with language, and do so efficiently.

Before that, I got my MSc in Physics from Heidelberg University, working on generative video modeling in Björn Ommer's CompVis group. I also spent time as a research intern at AWS Rekognition and had the pleasure of visiting Kosta Derpanis's lab in Toronto.

Email Google Scholar GitHub CV LinkedIn

News

Apr 2026 Joined Google DeepMind as a Student Researcher position
Sep 2025 Paper accepted to NeurIPS'25 on elastic pruning of ViTs paper
Jul 2025 Paper accepted to BMVC'25 on a new video-language benchmark paper
Jun 2025 Paper accepted to TMLR on LLMs as implicit optimizers for VLMs paper
Nov 2024 Gave a talk at the SURF Research Bootcamp on large-scale video learning talk
Sep 2024 Workshop organizer: "Self-Supervised Learning: What is Next?" at ECCV'24 organizer
Jul 2024 Paper accepted to ECCV'24 on SIGMA: Sinkhorn-Guided Masked Video Modeling paper
Jun 2024 Gave talks at TNO in The Hague and at the National Institute for Informatics in Tokyo talk
Apr 2024 Teaching Assistant for the Foundation Models (FoMo) course teaching

Research

🔬

Evaluation

Proposing new video-language benchmark

TVBench

💬

Vision-Language

Connecting vision with LLMs

PIN · GLOV

🧠

Representation

Learning from videos without supervision

SIGMA · SCVRL

🎞️

Synthesis

Video generation based on a single frame

cINN · iPOKE

⚡

Efficiency

Making vision models elastic and deployable

Elastic ViTs

Selected Publications

Showing first-author and equal-contribution papers. Full list on Google Scholar. * denotes equal contribution.

Elastic ViTs from Pretrained Models without Retraining

W. Simoncini*, M. Dorkenwald*, T. Blankevoort, C. Snoek, Y. Asano

NeurIPS 2025

ArXiv Project Code

TVBench: Redesigning Video-Language Evaluation

D. Cores*, M. Dorkenwald*, M. Mucientes, C. Snoek, Y. Asano

BMVC 2025

ArXiv Code HuggingFace

SIGMA: Sinkhorn-Guided Masked Video Modeling

M. Salehi*, M. Dorkenwald*, F. Thoker*, E. Gavves, C. Snoek, Y. Asano

ECCV 2024

ArXiv Project Code

PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs

M. Dorkenwald, N. Barazani, C. Snoek*, Y. Asano*

CVPR 2024

ArXiv Project Code

SCVRL: Shuffled Contrastive Video Representation Learning

M. Dorkenwald, F. Xiao, B. Brattoli, J. Tighe, D. Modolo

CVPR 2022 Workshop

ArXiv Project

Stochastic Image-to-Video Synthesis using cINNs

M. Dorkenwald, T. Milbich, A. Blattmann, R. Rombach, K. Derpanis, B. Ommer

CVPR 2021

ArXiv Project Code

Unsupervised Magnification of Posture Deviations Across Subjects

M. Dorkenwald*, U. Büchler*, B. Ommer

CVPR 2020

Paper Project

Experience

2026 — present

Google DeepMind

Student Researcher

Research on efficient video learning.

2022 — present

University of Amsterdam

PhD Candidate — QUVA Lab / ELLIS

Self-supervised video learning, vision-language models, and efficient foundation models. Supervised by Yuki Asano and Cees Snoek.

2022

AWS Rekognition

Research Intern

Self-supervised video representation learning.

2019 — 2022

Heidelberg University

MSc Computational Physics

Research on generative video modeling with Björn Ommer's CompVis group. Research visit at Kosta Derpanis's lab in Toronto.