SIMPLE R TOOLS FOR GENETIC MARKERS RESEARCH

R is a free licensed programming language which presents a big interest as a tool for bioinformatics data analysis. It is essential in research activities related to the analysis of molecular-biological data and the identification of molecular markers. In this article we describe two simple techniquesof using FASTA type sequences and genomic data for the research of genetic markers. In order to apply the functions described below it is necessary to have installed the R language, the seqRFLP&Maftoolspackages, and optionally - the Integrated Development Environment Rstudio.

R is a free licensed programming language which presents a big interest as a tool for bioinformatics data analysis. It is essential in research activities related to the analysis of molecular-biological data and the identification of molecular markers. In this article we describe two simple techniquesof using FASTA type sequences and genomic data for the research of genetic markers. In order to apply the functions described below it is necessary to have installed the R language, the seqRFLP&Maftoolspackages, and optionally -the Integrated Development Environment Rstudio.

SeqRFLPand markers on genetic maps
Simulation and visualization of Restriction Fragment Length Polymorphism (RFLP) patterns resulted from various DNA sequences can be performed using seqRFLP R package [1,2].
It includes functions for handling DNA sequences, especially for simulating RFLP patterns based on selected restriction enzymes and creation of so-called in silico RFLP genetic maps.The input data consist of FASTA format files and the visualization of the virtual map of simulated DNA digestion can be done with the help of the function plotenz().
The enzdata allows access to 777 restriction enzymes from which we can select those of interest (selected.enzymes) for the research of molecular markers (especially SNP markers) associated with studied sequences (dna.seq).
In Fig.1 are presented the RFLP map and the working process for studying polymorphic markers offour virtually fragmented DNA sequences with six restriction enzymes indicated in the selected.enzymes vector.

Maftools
Maftools [3] is an R package which in particular facilitates the analysis of oncogenomic data and which incorporates very useful functions for the research of molecular markers. The requested input data are MAF files or even simple TXT files containing the following 9 columns: Hugo_Symbol, Chromosome, Start_Position, End_Position, Reference_Allele, Tumor_Seq_Allele2, Variant_Classification, Variant_Type, Tumor_Sample_Barcode.Also for the detection of molecular markers associated with patient survival, are required clinical data which must contain at least 3 columns:Tumor_Sample_Barcode, Overall_Survival_StatusandDays_to_last_followup.
The Tumor_Sample_Barcodecolumn values of the clinical data must correspond to the values of the same column in the genomic data. The reading of data is done by using the functionread.maf().
The genomic.data and clinical.data are R dataframe objects and must contain the columns mentioned above. Subsequently, the package allows the use of the survGroup() function for detection of potential sets of mutated genes associated with decreased patient survival.
This function presents an increased interest for the detection of prognostic and risk stratification biomarkers in various pathologies and conditions.  [1,3,4].
The arguments of the function are as follows [3]: maf -an MAF object generated by the function read.maf();toptop mutated genes; geneSetSizechoose desired geneset size; timetime column name in the clinical.data object; Status -column name containing status of patients in the clinical.data object. Must be logical or numeric (1deceased, 0 -living).By using mafSurvGroup() function it is also possible to plot the survival curves of desired setof genes and wild-type group.
In addition to the above, the R language contains a variety of functions useful for research and identification of genetic markers in various fields of biology and medicine. Its advantage compared to other software platforms is the simplicity of the process of preparation, analysis and visualization of data, sometimes thatbeing performed with only one line of code.