Preprocessing method


We recommend the default MAS5.0 normalization steps with Entrez BrainArray Custom CDF. Final log-transformation is also recommended.


We recommend the Bowtie and Tophat alignment algorithm with NCBI's transcript reference.

Quantile transformation

We use quantile transformation in order to compute hgu133plus2-like expression values. The hgu133plus2 reference was constructed from 1000 random samples. This step is automatically taken after submission.

Submission file format

URSA(HD) expects a two column text file where the first column has Entrez ids or HGNC gene names and the second column has the corresponding quantified expression values. Refer to example files:

HG-U133 plus 2.0 example: GSM100888
HG-133A example: GSM74404
Illumina HiSeq 2000 example: ERX011182

Result Interpretation

Theoretically, URSAHD should make “no calls” for a unique diseases that is not included in the training set. The SVM margins from each URSAHD disease model would be very small and thus not informative for the Bayesian network - leading to posterior probabilities close to the prior. That being said, we do believe that most diseases are related to a certain extent. So in practice, the wide disease coverage of URSAHD training set could lead to detecting related-disease signals in this “novel” disease sample.

Manual Curation Annotation

In order to utilize the tissue relationships, gene expression experiments were annotated to a term or terms in the Brenda Tissue Ontology.  After an initial substring text-mining of sample descriptions in GEO, term-to-experiment pairs were manually verified based on their sample descriptions and associated publication(s) to exclude incorrect or ambiguous pairs. The associated publication (original paper) was examined only when the sample descriptions were ambiguous. Sample annotations were then propagated based on the tissue ontology.  Note that experiments weren’t necessarily annotated to their most specific term in the ontology although such attempts were made.

Manual tissue annotations are available here: manual_annotations_ursa.csv