We created this amazing R package to analyze high-dimensional time-to-events data. Thanks to machine learning we now know various variable selection techniques. Thus using Python and R we can implement variable sections easily. Most of the time the outcome variable types are dichotomous and continuous. Principle component analysis, factor analysis, LASSO, and ridge regression are widely used variable selection techniques in the case of continuous outcome variables. In the case of dichotomous categorical outcome variables techniques such as random forest, decision tree, and logistics regress are widely used.
In clinical trials, especially in cancer oncology trials time-to-event outcomes are of great concern. Outcomes such as the event of death, recurrence of disease, and progression of disease are studied. These outcomes are defined as time-to-event outcomes as they are time-dependent. In this kind of situation, the researchers are not just curious to estimate the probability of outcome but to also estimate the time required and the probability of surviving that time.
The Kaplan-Meier method is the most widely used method for performing survival analysis. The Cox Proportional Hazards regression method is used to fit the regression and to obtain the hazard ratio (risk measure of a particular study variable).
Prospective clinical trial data cannot be considered high-dimensional data. Retrospective studies which include data for specific time periods include a large number of samples and involve various study variables, demographic, clinical, pathological, pre-treatment parameters, post-treatment parameters, etc. However, inferential statistics is largely used to define the primary and secondary objectives of clinical trials. For the last two decades, the application of Bayesian statistics is highly used to design and analyze clinical trials and their data.
Moreover, high-dimensional datasets are genomics, and proteomics data which are largely termed as omic datasets and are freely available at the gene banks library.
Thus, gene selection and dimension reduction techniques could be applied while working on omic datasets. These data often include information on survival and other time-to-event outcomes. Hence, we need a reduction technique where the outcome or dependent variable is time-dependent.
Our R package SurvHiDim enlists various functions to analyze high-dimensional data with time-to-event outcomes.
Thank you for reading. Subscribe to my blogs. In my next blogs, I will explore the functions of the ‘SurvHiDim’ package and my other developed packages, various applications of statistics in medical research and especially in cancer oncology clinical trials, statistics in designing clinical studies, study designs, statistical analysis in clinical trials, meta-analysis, match-paired analysis.