Now that we have two augmented versions of the input batch, T1(B) and T2(B), we'll look into other components of the SimCLR training pipeline.
Network architecture
As shown in the figure below, the two augmented versions of an image, Xi (i.e., T1(Xi) and T2(Xi)), are passed through the neural network f(.) to get the penultimate feature representations, hi1, and hi2, respectively. These feature representations are passed again through a multilayer perceptron (MLP) projection head g(.) to get the feature embeddings zi1 and zi2 ...