Assessing an ORF quality

***NEW: Please see our new tool CloneMap here (short user guide coming soon)***

– Commercially-available ORF collections typically contain a significant number of clones that do not represent the canonical ORF for a given gene. A large fraction of this population of clones is constituted of natural variants such as minor infrequent isoforms or ORFs harboring potentially deleterious SNPs. In addition, some clones harbor mutations or truncations that arose during the cloning procedure. Indeed, as commercially-available ORF collections mostly derive from cDNA collections that were generated before the completion of the human genome project, many such mutations and truncations were not identified and were propagated to currently available collections.

– To help users determine whether a given clone represents a biologically relevant isoform and is devoid of potentially deleterious natural variations or cloning errors, we provide comments and ratings to the ORFs in our collections. To keep things reasonably complex, we analyze ORFs only at amino acids levels. When these comments and ratings are not yet available in our database, we will add them upon user request.

– Our comments typically indicate whether an isoform is the one chosen as the canonical isoform by Uniprot. If not, we provide more details on which isoform it represents in Uniprot and how distantly it is related to the canonical isoform in that database. In addition, we indicate whether the ORF matches one of the isoforms reported in Ensembl for that gene, and in particular when this isoform is part of the CCDS subset, which gives extra confidence that the isoform is biologically relevant. Finally, we also indicate when the given isoform was also identified as bone fide by manual annotation from the Havana team (isoforms indicated by a golden rectangle in Ensembl). In addition, we look whether the ORF contains potentially deleterious SNPs or other mutations, and indicate when it is the case.

– To allow for easy sorting and filtering of clones according to their “biological correctness”, in particular when large number of clones are needed, we have set up and applied the following rating system to our collections.

  • 1 star (*): should be excluded from your searches unless you look for mutants or truncations.
    • Aberrant/truncated ORF.
    • Isoforms predicted to undergo NMD.
    • Known deleterious variations.
  • 2 stars (**): doubtful ORF, requires user review.
    • Remote or infrequent isoform (not consistently present amongst UniProt and Ensembl isoforms).
    • Any true isoform bearing unreported non-conservative variations.
  • 3 stars (***): ORF probably OK, but some user review advisable.
    • In a few words… anything in-between ** and ****.
    • Typically a true UniProt and Ensembl isoform, but that is not the canonical isoform.
    • Cases where UniProt or Ensembl disagree, and the isoform is canonical only in one of them.
    • Canonical isoform bearing unreported conservative variations.
  • 4 stars (****): Perfect ORF, no user review required unless a very specific isoform is required.
    • Canonical UniProt isoform also part of Ensembl CCDS subset of isoforms.
    • Variations (conservative or not) only tolerated if reported or trustfully predicted to be non-detrimental.

– In many instances, our collections include several ORFs for a given gene, typically different isoforms or natural variants. Be sure to look whether the gene is represented by several clones or in different collections to have more chance to find the exact ORF of your dreams.

– Do not hesitate to contact us in case of doubts or problems evaluating whether a given ORF/clone is suitable for your experiments.