VITIS Downloads

The following downaloda are related to open data.

Transcriptomic dataset

The transcriptomic dataset used to run OneGenE for expansion lists computing. The grapevine transcriptomic dataset has been obtained by performing a pre-processing of the publicly available Vespucci compendia (29,090 genes and 2,017 contrasts, Moretto et al., 2016 ). The pre-processing procedure, described in (Malacarne et al., 2018), comprised three steps:

  1. removal of contrasts with more than 55% of missing values;
  2. removal of genes with more than 55% of missing values;
  3. for each gene, replacement of the remaining missing values with the median of its contrasts values.

Size: 108.72 MiB

Genes List

The list of 28.013 genes used in our expansions.

Size: 671.94 KiB

Onegene Data

The expansion of each gene of the Vitis vinifera genome originated from the run of the algorithms of the OneGenE system. Each expansion procedure consisted of 2000 iterations of our C++ implementation (https://bitbucket.org/francesco-asnicar/pc-boinc/, last accessed on 16 June 2020) of the PC-algorithm skeleton procedure (α = 0.05) to 29 sets of 1000 variables (genes), which included the gene to be expanded, and a random subset of 999 genes sampled without replacement. The blocks scheme of the OneGenE architecture is shown in Blanzieri et al. 2020, Figure 1. The input data were extracted from the 28,013 × 1131 normalized expression data matrix initially obtained from the VESPUCCI repository, filtered, and preprocessed. Each expansion list is ordered with respect to the relative frequency, namely F_rel = #times the gene is present in the output of the PC-algorithm/# times the gene is present in the input of the PC-algorithm. Overall, the computation of the expansion lists of all the genes required 28,013 × 29 × 2000 = 1,624,754,000 runs of the PC algorithm, each run taking 5.63 s on average on our reference machine (Intel i7-4770K, Ubuntu). Therefore, the computation was done within the gene@home project on the volunteer distributed computation platform TN-Grid (20 TeraFLOPS on average), powered by BOINC software (S. Pilati et al. https://www.mdpi.com/2218-273X/11/12/1744).

The output as presented here is a list of pair of transcripts with the absolute and relative frequency of detection the latter while expanding the gene regulatory network of the former.

Size: 213.90 MiB

Pearson Data

The values of Pearson correlation computed on the transcriptomic dataset above.

Size: 2.50 GiB

Pearson Data Minimal

The same of Pearson data but limited to the pair who has a non zero relative frequency in the OneGenE data.

Size: 213.90 MiB

Tool

The tools developed to analyze, aggregate and visualize the OneGenE expansion lists as gene networks.
These tools are partially integrated in this website but the original programs are available for local installation and use.

GitHub repo