**Describe the bug** <!--- A clear and concise description of what the bug is. --> There is an incoherence in the definition of the `pooled_output` in ElectraBackbone and BertBackbone vs AlbertBackbone and FNet. - In ElectraBackbone and BertBackbone the pooled_output is defined as the pooling of the cls token before the dense layer. - In AlbertBackbone and FNet the pooled_output is defined as the output of dense layer which takes the cls token from the sequence output. **Expected behavior** <!--- A clear and concise description of what you expected to happen. --> The pooled_output should have one definition or follow the original implementation. **Additional context** <!--- Add any other context about the problem here. --> The original implementation of [Bert](https://github.com/google-research/bert/blob/master/modeling.py#L224-L232), [Fnet](https://github.com/google-research/google-research/blob/master/f_net/models.py#L119), [Albert](https://github.com/google-research/albert/blob/master/modeling.py#L247-L255) **Would you like to help us fix it?** I would like to work on this issue.