We conducted empirical experiments to assess the transferability of a light curve transformer to datasets with different cadences and flux distributions using various positional encodings (PEs). We proposed a new approach to incorporate the temporal information directly to the output of the last attention layer. Our results indicated that using trainable PEs lead to significant improvements in the transformer performances and training times. Our proposed PE on attention can be trained faster than the traditional non-trainable PE transformer while achieving competitive results when transfered to other datasets.