One-hot encoding and charges of wild-type and mutated amino acids: We constructed two distinct vectors that represent the wild-type and mutated amino acids. Each amino acid was encoded by a binary vector of length 20, with a value of 1 at the corresponding position and 0s elsewhere (data not provided here). We constructed an additional vector that encodes the charge on the wild-type and mutated amino acids (data not provided here).
Phosphomimetic or acetylation mimicking:A variant was considered phosphomimetic if the amino acid changed from a Ser(S) or Thr(T) to an Asp(D) or Glu(E), and acetylation mimicking if the amino acid changed from Lys(K) to Gln(Q) (data not provided here).
ATP binding pocket: We calculated the number of known ATP binding sites at the position equivalent to the variant in the alignment. We obtained the list of known ATP binding sites in human kinases from UniProt (version 2023_02).
Post-translational modification information: We incorporated known post-translational modification (PTM) information of the variant position and its adjacent positions (window size = 5) as a feature vector, with a length equal to the number of possible PTM types (phosphorylation, acetylation, methylation, etc.). The presence of a specific PTM type was represented by 1, and otherwise as 0. We repeated the procedure to incorporate known PTM information at the alignment position equivalent to the variant position, and its adjacent residues (window size = 5). Each element in the vector encoded the number of kinases harbouring the corresponding PTM type at the given position in the alignment.
Loss/gain of amino acids in known mutations: We also incorporated the number of times an amino acid was observed to be a wild-type (loss) or mutated (gain) in a mutation type (i.e. activating, deactivating, and resistance) at the position equivalent to the variant (and its adjacent residues; window size=5) in the alignment. We set the count initially to zero for all the amino acids at all alignment positions. For a loss of an amino acid at an alignment position in a mutation type, we decreased the corresponding count by 1, and increased for a gain.