Exploration Log of Contrastive Language-Image Pre-training
Introduction Last month OpenAI released CLIP - a neural network that learns to map text and images into the same embedding space using contrastive objective and multi-class N-pair loss.
TL;DR The key ingredients of the architecture are:
[Read More]