Title: | A Comprehensive Collection of Cancer Types and Cancer-related DataSets |
---|---|
Description: | Offers a rich collection of data focused on cancer research, covering survival rates, genetic studies, biomarkers, and epidemiological insights. Designed for researchers, analysts, and bioinformatics practitioners, the package includes datasets on various cancer types such as melanoma, leukemia, breast, ovarian, and lung cancer, among others. It aims to facilitate advanced research, analysis, and understanding of cancer epidemiology, genetics, and treatment outcomes. |
Authors: | Renzo Caceres Rossi [aut, cre] |
Maintainer: | Renzo Caceres Rossi <[email protected]> |
License: | GPL-3 |
Version: | 0.1.0 |
Built: | 2024-12-11 05:30:42 UTC |
Source: | https://github.com/lightbluetitan/oncodatasets |
This dataset, AflatoxinLiverCancer_df, is a data frame containing data from a study where varying doses of Aflatoxin B1 were administered to lab animals. The dataset records the total number of animals exposed to each dose and the number of animals that developed liver cancer.
data(AflatoxinLiverCancer_df)
data(AflatoxinLiverCancer_df)
A data frame with 6 observations and 3 variables:
Dose of Aflatoxin B1 administered (integer).
Total number of animals exposed to the dose (integer).
Number of animals that developed liver cancer (integer).
The dataset name has been kept as 'AflatoxinLiverCancer_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Data taken from the faraway package. Gaylor DW (1987). *Linear nonparametric upper limits for low dose extrapolation*. ASA Proceedings of the Biopharmaceutical Section.
This dataset, AIPulmonaryNodules_df, is a data frame containing data from a study on the performance of an artificial intelligence (AI) risk stratification tool for assessing Indeterminate Pulmonary Nodules (IPNs) in chest CT scans. The dataset includes information on whether cancer was diagnosed and the AI tool's rating of the probability of cancer (from 0 to 100).
data(AIPulmonaryNodules_df)
data(AIPulmonaryNodules_df)
A data frame with 200 observations and 2 variables:
Cancer diagnosis – whether the nodule is cancerous (1 = cancer, 0 = no cancer) (integer).
AI rating of the probability of cancer, ranging from 0 to 100 (integer).
The dataset name has been kept as 'AIPulmonaryNodules_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Data taken from the R4HCR package.
This dataset, AlcoholIntakeCancer_df, is a data frame containing data related to alcohol intake and its association with colorectal cancer risk. The data includes information on alcohol intake levels (dose), the number of cancer cases, person-years of observation, and the relative risk (logrr) along with its standard error (se). The dataset consists of 48 observations with 7 variables.
data(AlcoholIntakeCancer_df)
data(AlcoholIntakeCancer_df)
A data frame with 48 observations and 7 variables:
Identifier for the study (factor).
Type of study (factor).
Level of alcohol intake (numeric).
Number of colorectal cancer cases (integer).
Person-years of observation (numeric).
Logarithm of the relative risk (numeric).
Standard error of the logarithm of the relative risk (numeric).
The dataset name has been kept as 'AlcoholIntakeCancer_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Data taken from the mixmeta package. Available at: https://www.cengage.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&product_isbn_issn=9781111531041
This dataset, BladderCancer_df, is a data frame containing data on recurrences of bladder cancer. It is commonly used to demonstrate methodology for recurrent event modelling. The dataset includes information from 340 observations and 7 variables related to bladder cancer recurrences.
data(BladderCancer_df)
data(BladderCancer_df)
A data frame with 340 observations and 7 variables:
Patient identifier (integer).
Treatment received: 1 = thiotepa, 2 = placebo (numeric).
Number of recurrences (integer).
Size of the recurrence (integer).
Time at which the event or censoring occurred (integer).
Event status: 1 = recurrence, 0 = no recurrence or death (numeric).
Event enumeration (integer).
The dataset name has been kept as 'BladderCancer_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Data taken from the survival package.
This dataset, BloodStorageProstate_df, is a data frame containing data on 316 men who underwent radical prostatectomy and received a transfusion during or within 30 days of the surgery. The dataset includes demographic, baseline, and prognostic factors, as well as data on the time to biochemical recurrence of prostate cancer, as indicated by prostate serum antigen (PSA) levels. The main exposure of interest was the red blood cell (RBC) storage duration group, and the outcome of interest was time to PSA cancer recurrence.
data(BloodStorageProstate_df)
data(BloodStorageProstate_df)
A data frame with 316 observations and 20 variables:
Age group of red blood cells (numeric).
Median age of red blood cells (numeric).
Patient's age (numeric).
African American status (numeric).
Family history of prostate cancer (numeric).
Prostate volume (numeric).
Tumor volume (numeric).
Tumor stage (numeric).
Biopsy grade score (numeric).
Bone metastasis status (numeric).
Organ confinement status (numeric).
Preoperative prostate serum antigen level (numeric).
Preoperative therapy received (numeric).
Number of blood transfusion units (numeric).
Surgical Gleason score (numeric).
Any adjuvant therapy received (numeric).
Adjuvant radiation therapy received (numeric).
Cancer recurrence status (numeric).
Censoring status (numeric).
Time to biochemical recurrence in months (numeric).
The dataset name has been kept as 'BloodStorageProstate_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Data taken from the medicaldata package. Cata et al. (2011). *Blood Storage Duration and Biochemical Recurrence of Cancer after Radical Prostatectomy*. Mayo Clinic Proceedings, 86(2), 120–127.
This dataset, BrainCancerCases_df, is a data frame containing data on brain cancer cases in New Mexico. It includes information about the county, number of cases, year of diagnosis, age group, and sex of the patients. The dataset consists of 1175 observations with 5 variables.
data(BrainCancerCases_df)
data(BrainCancerCases_df)
A data frame with 1175 observations and 5 variables:
County of diagnosis (Factor with 31 levels).
Number of cases (integer).
Year of diagnosis (integer).
Age group of patients (integer).
Sex of the patient (integer).
The dataset name has been kept as 'BrainCancerCases_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Data taken from the rsatscan package, distributed with SaTScan software: https://www.satscan.org
This dataset, BrainCancerGeo_df, is a data frame containing geographic information related to brain cancer cases in New Mexico. It includes data on the county, latitude, and longitude of the regions where brain cancer cases have been reported. The dataset consists of 32 observations with 3 variables.
data(BrainCancerGeo_df)
data(BrainCancerGeo_df)
A data frame with 32 observations and 3 variables:
County where the cases were recorded (Factor with 32 levels).
Latitude of the county (integer).
Longitude of the county (integer).
The dataset name has been kept as 'BrainCancerGeo_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Data taken from the rsatscan package, distributed with SaTScan software: https://www.satscan.org
This dataset, BRCA1BreastCancer_df, is a data frame containing data on the cumulative risk of breast cancer in women with the BRCA1 mutation as a function of their age. The dataset includes 11 observations, with each entry representing the cumulative risk at a specific age (in years).
data(BRCA1BreastCancer_df)
data(BRCA1BreastCancer_df)
A data frame with 11 observations and 2 variables:
Age of the individual in years (numeric).
Cumulative risk of breast cancer at that age (numeric).
The dataset name has been kept as 'BRCA1BreastCancer_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Data taken from the riskyr package.
This dataset, BRCA1OvarianCancer_df, is a data frame containing data on the cumulative risk of ovarian cancer in women with the BRCA1 mutation as a function of their age. The dataset includes 63 observations, with each entry representing the cumulative risk at a specific age (in years).
data(BRCA1OvarianCancer_df)
data(BRCA1OvarianCancer_df)
A data frame with 63 observations and 2 variables:
Age of the individual in years (numeric).
Cumulative risk of ovarian cancer at that age (numeric).
The dataset name has been kept as 'BRCA1OvarianCancer_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Data taken from the riskyr package. Based on Figure 2 (p. 2408) of Kuchenbaecker, K. B., Hopper, J. L., Barnes, D. R., Phillips, K. A., Mooij, T. M., Roos-Blom, M. J., ... & BRCA1 and BRCA2 Cohort Consortium (2017). Risks of breast, ovarian, and contralateral breast cancer for BRCA1 and BRCA2 mutation carriers. JAMA, 317 (23), 2402-2416. doi: 10.1001/jama.2017.7112
This dataset, BRCA2BreastCancer_df, is a data frame containing data on the cumulative risk of breast cancer in women with the BRCA2 mutation as a function of their age. The dataset includes 11 observations, with each entry representing the cumulative risk at a specific age (in years).
data(BRCA2BreastCancer_df)
data(BRCA2BreastCancer_df)
A data frame with 11 observations and 2 variables:
Age of the individual in years (numeric).
Cumulative risk of breast cancer at that age (numeric).
The dataset name has been kept as 'BRCA2BreastCancer_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Data taken from the riskyr package.
This dataset, BRCA2OvarianCancer_df, is a data frame containing data on the cumulative risk of ovarian cancer in women with the BRCA2 mutation as a function of their age. The dataset includes 63 observations, with each entry representing the cumulative risk at a specific age (in years).
data(BRCA2OvarianCancer_df)
data(BRCA2OvarianCancer_df)
A data frame with 63 observations and 2 variables:
Age of the individual in years (numeric).
Cumulative risk of ovarian cancer at that age (numeric).
The dataset name has been kept as 'BRCA2OvarianCancer_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Data taken from the riskyr package. Based on Figure 2 (p. 2408) of Kuchenbaecker, K. B., Hopper, J. L., Barnes, D. R., Phillips, K. A., Mooij, T. M., Roos-Blom, M. J., ... & BRCA1 and BRCA2 Cohort Consortium (2017). Risks of breast, ovarian, and contralateral breast cancer for BRCA1 and BRCA2 mutation carriers. JAMA, 317 (23), 2402–2416. doi: 10.1001/jama.2017.7112
This dataset, BreastCancerWI_df, is a data frame containing diagnostic information for 569 patients with breast cancer. The data includes features computed from digitized images of fine needle aspirates (FNA) of breast masses, as well as a diagnosis label indicating whether the mass is malignant or benign.
data(BreastCancerWI_df)
data(BreastCancerWI_df)
A data frame with 569 observations and 31 variables:
Diagnosis of the breast mass: malignant or benign (factor with 2 levels).
Mean radius of the mass (numeric).
Mean texture of the mass (numeric).
Mean perimeter of the mass (numeric).
Mean area of the mass (numeric).
Mean smoothness of the mass (numeric).
Mean compactness of the mass (numeric).
Mean concavity of the mass (numeric).
Mean number of concave points on the mass contour (numeric).
Mean symmetry of the mass (numeric).
Mean fractal dimension of the mass (numeric).
Standard deviation of the radius (numeric).
Standard deviation of the texture (numeric).
Standard deviation of the perimeter (numeric).
Standard deviation of the area (numeric).
Standard deviation of the smoothness (numeric).
Standard deviation of the compactness (numeric).
Standard deviation of the concavity (numeric).
Standard deviation of the number of concave points (numeric).
Standard deviation of the symmetry (numeric).
Standard deviation of the fractal dimension (numeric).
Worst (peak) value of the radius (numeric).
Worst (peak) value of the texture (numeric).
Worst (peak) value of the perimeter (numeric).
Worst (peak) value of the area (numeric).
Worst (peak) value of the smoothness (numeric).
Worst (peak) value of the compactness (numeric).
Worst (peak) value of the concavity (numeric).
Worst (peak) number of concave points (numeric).
Worst (peak) value of the symmetry (numeric).
Worst (peak) value of the fractal dimension (numeric).
The dataset name has been kept as 'BreastCancerWI_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The original content has not been modified in any way.
Data taken from the cases package. Original documentation available at: https://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+(diagnostic).
This dataset, CA19PancreaticCancer_df, is a data frame containing data from a diagnostic accuracy review on the CA19-9 biomarker used for diagnosing pancreatic cancer. The dataset includes the number of true positives (TP), false positives (FP), false negatives (FN), and true negatives (TN) from various studies.
data(CA19PancreaticCancer_df)
data(CA19PancreaticCancer_df)
A data frame with 22 observations and 5 variables:
Name or identifier of the study (character).
True positives – the number of correctly identified positive cases (integer).
False positives – the number of cases incorrectly identified as positive (integer).
False negatives – the number of cases incorrectly identified as negative (integer).
True negatives – the number of correctly identified negative cases (integer).
The dataset name has been kept as 'CA19PancreaticCancer_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Data taken from the R4HCR package.
This dataset, cancer_in_dogs_tbl_df, is a tibble containing information from a study conducted in 1994. The study aimed to determine whether there is an increased risk of cancer in dogs exposed to the herbicide 2,4-Dichlorophenoxyacetic acid (2,4-D). It includes data from 491 dogs diagnosed with cancer (case group) and 945 dogs without cancer (control group).
data(cancer_in_dogs_tbl_df)
data(cancer_in_dogs_tbl_df)
A tibble with 1,436 observations and 2 variables:
Indicates whether the dog belongs to the "case" group (with cancer) or the "control" group (without cancer) (factor with 2 levels).
Indicates the dog's exposure to the herbicide 2,4-D, with levels such as "exposed" or "not exposed" (factor with 2 levels).
The dataset name has been kept as 'cancer_in_dogs_tbl_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix 'tbl_df' indicates that the dataset is a tibble. The original content has not been modified in any way.
Data taken from the openintro package. Original study: Hayes HM, Tarone RE, Cantor KP, Jessen CR, McCurnin DM, and Richardson RC. 1991. Case-Control Study of Canine Malignant Lymphoma: Positive Association With Dog Owner's Use of 2,4-Dichlorophenoxyacetic Acid Herbicides. *Journal of the National Cancer Institute*, 83(17):1226-1231.
This dataset, CancerSmokeCity_array, is an array containing data on lung cancer rates by smoking status and city. The data includes 32 observations organized by whether the individual smokes, their lung cancer status, and the city. The dimensions of the array are: 2 smoking statuses (smokes, does not smoke), 2 lung cancer statuses (cancer, no cancer), and 8 cities.
data(CancerSmokeCity_array)
data(CancerSmokeCity_array)
An array with 32 elements, with dimensions:
Smoking status (character): 2 categories (smokes, does not smoke).
Lung cancer status (character): 2 categories (cancer, no cancer).
City (character): 8 cities.
The dataset name has been kept as 'CancerSmokeCity_array' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_array' indicates that the dataset is an array. The original content has not been modified in any way.
Data taken from the flatr package. Based on data in Z. Liu, Int. J. Epidemiol., 21: 197–201, 1992.
This dataset, Carcinoma_p53_df, is a data frame containing data related to the presence of the mutant p53 tumor suppressor gene and its potential role as a prognostic factor in patients with squamous cell carcinoma arising from the oropharynx cavity. The dataset includes unadjusted estimates of log hazard ratios for mutant p53 compared to normal p53 for disease-free and overall survival, along with their associated variances, collected from 6 observational studies. The dataset consists of 6 observations with 5 variables.
data(Carcinoma_p53_df)
data(Carcinoma_p53_df)
A data frame with 6 observations and 5 variables:
Study identifier (integer).
Unadjusted log hazard ratio for disease-free survival (numeric).
Unadjusted log hazard ratio for overall survival (numeric).
Variance of the log hazard ratio for disease-free survival (numeric).
Variance of the log hazard ratio for overall survival (numeric).
The dataset name has been kept as 'Carcinoma_p53_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Data taken from the mixmeta package. References:
Jackson D, Riley R, White IR (2011). Multivariate meta-analysis: Potential and promise. Statistics in Medicine. 30 (20);2481–2498.
Tandon S, Tudur-Smith C, Riley RD, et al. (2010). A systematic review of p53 as a prognostic factor of survival in squamous cell carcinoma of the four main anatomical subsites of the head and neck. Cancer Epidemiology, Biomarkers and Prevention. 19 (2):574–587.
Sera F, Armstrong B, Blangiardo M, Gasparrini A (2019). An extended mixed-effects framework for meta-analysis. Statistics in Medicine. 2019;38(29):5429–5444.
This dataset, CASP8BreastCancer_df, is a data frame containing results from 4 case-control studies examining the association between the CASP8 -652 6N del promoter polymorphism and breast cancer risk. The dataset includes information on the presence or absence of the polymorphism in both cases (breast cancer patients) and controls, with different genotypic combinations analyzed.
data(CASP8BreastCancer_df)
data(CASP8BreastCancer_df)
A data frame with 4 observations and 7 variables:
Study identifier (character).
Number of breast cancer cases with the ins/ins genotype (integer).
Number of breast cancer cases with the ins/del genotype (integer).
Number of breast cancer cases with the del/del genotype (integer).
Number of control cases with the ins/ins genotype (integer).
Number of control cases with the ins/del genotype (integer).
Number of control cases with the del/del genotype (integer).
The dataset name has been kept as 'CASP8BreastCancer_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The original content has not been modified in any way.
Data taken from the metadat package. Frank, B., Rigas, S. H., Bermejo, J. L., Wiestler, M., Wagner, K., Hemminki, K., Reed, M. W., Sutter, C., Wappenschmidt, B., Balasubramanian, S. P., Meindl, A., Kiechle, M., Bugert, P., Schmutzler, R. K., Bartram, C. R., Justenhoven, C., Ko, Y.-D., Brüning, T., Brauch, H., Hamann, U., Pharoah, P. P. D., Dunning, A. M., Pooley, K. A., Easton, D. F., Cox, A. & Burwinkel, B. (2008). The CASP8 -652 6N del promoter polymorphism and breast cancer risk: A multicenter study. Breast Cancer Research and Treatment, 111(1), 139-144. https://doi.org/10.1007/s10549-007-9752-z
This dataset, CervicalCancer_df, is a data frame containing data from a study evaluating the diagnostic accuracy of CIN2+ detection using a combined approach with naked-eye and digital VIA (visual inspection with acetic acid) on a Samsung Galaxy J5 smartphone, compared to traditional naked-eye inspection alone.
data(CervicalCancer_df)
data(CervicalCancer_df)
A data frame with 181 observations and 10 variables:
Presence of HPV16 (Factor with 2 levels).
Presence of HPV18/45 (Factor with 2 levels).
Presence of other HPV strains (Factor with 2 levels).
Naked-eye VIA result (Factor with 2 levels).
Digital VIA result with smartphone (Factor with 2 levels).
Treatment received (Factor with 2 levels).
Combined naked-eye and digital VIA (Factor with 2 levels).
Histological diagnosis (Factor with 5 levels).
Cytological diagnosis (Factor with 7 levels).
CIN2+ status (Factor with 2 levels).
The dataset name has been kept as 'CervicalCancer_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Data taken from the R4HCR package. Data directly available from https://yareta.unige.ch/archives/ffbeb6d7-b390-4755-987e-8faf85f97c67
This dataset, ChildCancer_df, is a data frame containing information on 406 children diagnosed with cancer between January 1, 1999, and December 31, 2003, in the region of North Portugal. The dataset includes complete records on the age at diagnosis, demographic details, and survival information. Due to the interval sampling, the age at diagnosis is doubly truncated by the time from birth to the beginning and end of the study.
data(ChildCancer_df)
data(ChildCancer_df)
A data frame with 406 observations and 8 variables:
Unspecified numerical variable (numeric).
Unspecified numerical variable (numeric).
Unspecified numerical variable (numeric).
Cancer group classification (numeric).
Survival status of the child: 1 = alive, 2 = deceased (numeric).
Survival time in days (numeric).
Residence type of the child: 1 = urban, 2 = rural (numeric).
Sex of the child: 1 = male, 2 = female (numeric).
The dataset name has been kept as 'ChildCancer_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Data taken from the DTDA package. The childhood cancer data were gathered from the IPO (Registo Oncológico do Norte) service in North Portugal, kindly provided by Doctor Maria José Bento.
This dataset, ColonCancerChemo_df, is a data frame containing data from one of the first successful trials of adjuvant chemotherapy for stage B/C colon cancer. The dataset includes information from 1858 observations and 16 variables. Each patient has two records: one for recurrence and one for death.
data(ColonCancerChemo_df)
data(ColonCancerChemo_df)
A data frame with 1858 observations and 16 variables:
Patient identifier (numeric).
Study identifier (numeric).
Treatment received: 1 = observation, 2 = levamisole, 3 = levamisole+5-FU (factor).
Sex of the patient: 1 = male, 2 = female (numeric).
Age of the patient (numeric).
Obstruction of the colon: 1 = yes, 0 = no (numeric).
Perforation of the colon: 1 = yes, 0 = no (numeric).
Adherence to nearby organs: 1 = yes, 0 = no (numeric).
Number of positive lymph nodes detected (numeric).
Survival status: 1 = alive, 2 = dead (numeric).
Tumor differentiation: 1 = well, 2 = moderate, 3 = poor (numeric).
Tumor extent: 1 = submucosa, 2 = muscle, 3 = serosa, 4 = contiguous structures (numeric).
Surgical intervention: 0 = short, 1 = long (numeric).
Presence of 4+ positive lymph nodes: 1 = yes, 0 = no (numeric).
Follow-up time in days (numeric).
Event type: 1 = recurrence, 2 = death (numeric).
The dataset name has been kept as 'ColonCancerChemo_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Data taken from the survival package.
This dataset, ColorectalMiRNAs_tbl_df, is a tibble containing information from PubMed abstracts related to microRNAs (miRNAs) in colorectal cancer. The data provides key details such as publication metadata, article abstracts, and associated miRNAs. The dataset consists of 508 observations with 8 variables.
data(ColorectalMiRNAs_tbl_df)
data(ColorectalMiRNAs_tbl_df)
A tibble with 508 observations and 8 variables:
PubMed Identifier (numeric).
Publication year of the article (numeric).
Title of the PubMed article (character).
Abstract of the article (character).
Language of the article (character).
Type of publication, e.g., review, study (character).
Research topic related to colorectal cancer and miRNAs (character).
Specific microRNAs mentioned in the publication (character).
The dataset name has been kept as 'ColorectalMiRNAs_tbl_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_tbl_df' indicates that the dataset is a tibble, which is an enhanced version of a data frame in R. The original content has not been modified in any way.
Data taken from the miRetrieve package. More information is available at: https://pubmed.ncbi.nlm.nih.gov/
This dataset, EndometrialCancer_df, is a data frame containing information on histology grades and associated risk factors for 79 cases of endometrial cancer. The dataset provides variables related to histological grades, pathological indices, and other clinical measures. The dataset consists of 79 observations with 4 variables.
data(EndometrialCancer_df)
data(EndometrialCancer_df)
A data frame with 79 observations and 4 variables:
Nuclear volume (integer).
Pathological index (integer).
Endometrial hyperplasia (numeric).
Histology grade (integer).
The dataset name has been kept as 'EndometrialCancer_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Data taken from the enrichwith package. The dataset was first analyzed in Heinze and Schemper (2002) and originally provided by Dr. E. Asseryanis from the Medical University of Vienna. The data was downloaded in .dat format from https://users.stat.ufl.edu/~aa/glm/data/, which provides datasets used in Agresti (2015).
This dataset, HeadNeckCarcinoma_df, is a data frame containing results from 65 trials examining mortality risk in patients with nonmetastatic head and neck squamous-cell carcinoma receiving either locoregional treatment plus chemotherapy versus locoregional treatment alone. The dataset provides the observed minus expected number of deaths and corresponding variances in the locoregional treatment plus chemotherapy group.
data(HeadNeckCarcinoma_df)
data(HeadNeckCarcinoma_df)
A data frame with 65 observations and 5 variables:
Trial identifier (numeric).
Name of the trial (character).
Observed minus expected number of deaths (numeric).
Variance of the observed minus expected deaths (numeric).
Treatment group (integer).
The dataset name has been kept as 'HeadNeckCarcinoma_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Data taken from the metadat package. Pignon, J. P., Bourhis, J., Domenge, C., & Designe, L. (2000). Chemotherapy added to locoregional treatment for head and neck squamous-cell carcinoma: Three meta-analyses of updated individual data. Lancet, 355(9208), 949-955. https://doi.org/10.1016/S0140-6736(00)90011-4
This dataset, ICGCLiver_df, is a data frame containing liver cancer data from Japan, released by the ICGC database. The dataset includes survival time, event status, and expression levels for four genes (ANLN, CENPA, GPR182, and BCO2).
data(ICGCLiver_df)
data(ICGCLiver_df)
A data frame with 232 observations and 6 variables:
Survival time (numeric).
Event status (1 = event occurred, 0 = censored) (integer).
Expression level of the ANLN gene (numeric).
Expression level of the CENPA gene (numeric).
Expression level of the GPR182 gene (numeric).
Expression level of the BCO2 gene (numeric).
The dataset name has been kept as 'ICGCLiver_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Data taken from the ggrisk package. ICGC (International Cancer Genome Consortium) database. Liver cancer data from Japan.
This dataset, LeukemiaLymphomaCases_df, is a data frame containing information on the number of leukemia and lymphoma cases reported in different locations within North Humberside. The dataset includes the location ID and the number of cases for each location.
data(LeukemiaLymphomaCases_df)
data(LeukemiaLymphomaCases_df)
A data frame with 191 observations and 2 variables:
Location ID (integer).
Number of leukemia and lymphoma cases (integer).
The dataset name has been kept as 'LeukemiaLymphomaCases_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Data taken from the rsatscan package, distributed with SaTScan software: https://www.satscan.org
This dataset, LeukemiaLymphomaControl_df, is a data frame containing information on the number of control cases for leukemia and lymphoma reported in different locations within North Humberside. The dataset includes the location ID and the number of control cases for each location.
data(LeukemiaLymphomaControl_df)
data(LeukemiaLymphomaControl_df)
A data frame with 191 observations and 2 variables:
Location ID (integer).
Number of control cases (integer).
The dataset name has been kept as 'LeukemiaLymphomaControl_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Data taken from the rsatscan package, distributed with SaTScan software: https://www.satscan.org
This dataset, LeukemiaLymphomaGeo_df, is a data frame containing the geographical coordinates (x and y) for locations in North Humberside related to leukemia and lymphoma cases. It includes the location ID and the coordinates for each of the 191 locations.
data(LeukemiaLymphomaGeo_df)
data(LeukemiaLymphomaGeo_df)
A data frame with 191 observations and 3 variables:
Location ID (integer).
X-coordinate (integer).
Y-coordinate (integer).
The dataset name has been kept as 'LeukemiaLymphomaGeo_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Data taken from the rsatscan package, distributed with SaTScan software: https://www.satscan.org
This dataset, LeukemiaRemission_df, is a data frame containing data on the duration of remission for acute leukemia patients who were randomly assigned to maintenance therapy with 6-mercaptopurine (6-MP), an active antileukemic compound, or a placebo. The dataset includes the sex, white blood cell (WBC) count, time to relapse, event status, and treatment group for the patients.
data(LeukemiaRemission_df)
data(LeukemiaRemission_df)
A data frame with 42 observations and 5 variables:
Sex of the patient (integer).
White blood cell (WBC) count (numeric).
Time to relapse (integer).
Event status (Factor with 2 levels: 1 = relapse, 0 = no relapse).
Treatment group (Factor with 2 levels: 1 = 6-MP, 0 = placebo).
The dataset name has been kept as 'LeukemiaRemission_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Data taken from the R4HCR package. Kleinbaum, D.G. and Klein, M., 1996. Survival Analysis: A Self-Learning Text. Springer.
This dataset, LeukemiaSurvival_df, is a data frame containing remission survival times of 42 leukemia patients enrolled in a placebo-controlled randomized controlled trial (RCT). The dataset includes information on the time to remission, patient status, sex, white blood cell count (log-transformed), and treatment regimen.
data(LeukemiaSurvival_df)
data(LeukemiaSurvival_df)
A data frame with 42 observations and 5 variables:
Time to remission in days (integer).
Patient status (1 for event, 0 for censored) (integer).
Gender of the patient (numeric, 1 for male, 2 for female).
Log-transformed white blood cell count (numeric).
Treatment regimen (numeric, coded treatment type).
The dataset name has been kept as 'LeukemiaSurvival_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Data taken from the autoReg package.
This dataset, LungCancerETS_df, is a data frame containing results from 37 studies on the risk of lung cancer in women exposed to environmental tobacco smoke (ETS) from their smoking spouse. The dataset includes data from both cohort and case-control studies, focusing on women who are lifelong nonsmokers but have been exposed to ETS.
data(LungCancerETS_df)
data(LungCancerETS_df)
A data frame with 37 observations and 11 variables:
Study identifier (integer).
Author(s) of the study (character).
Year of publication (integer).
Country where the study was conducted (character).
Design of the study (e.g., cohort or case-control) (character).
Number of cases in the study (integer).
Odds ratio estimate (numeric).
Lower bound of the odds ratio confidence interval (numeric).
Upper bound of the odds ratio confidence interval (numeric).
Effect size estimate (numeric).
Variance of the effect size estimate (numeric).
The dataset name has been kept as 'LungCancerETS_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Data taken from the metadat package. Hackshaw, A. K., Law, M. R., & Wald, N. J. (1997). The accumulated evidence on lung cancer and environmental tobacco smoke. British Medical Journal, 315(7114), 980-988. https://doi.org/10.1136/bmj.315.7114.980 Hackshaw, A. K. (1998). Lung cancer and passive smoking. Statistical Methods in Medical Research, 7(2), 119-136. https://doi.org/10.1177/096228029800700203
This dataset, LungNodulesDetected_df, is a data frame containing data on incidental or screen-detected lung nodules. The data includes information such as patient demographics, smoking status, nodule characteristics, and whether the nodule is malignant. The dataset was collected from patients with pulmonary nodules of up to 15mm detected on routine CT chest scans, aged 18 years or older, from 3 academic centers in the UK.
data(LungNodulesDetected_df)
data(LungNodulesDetected_df)
A data frame with 999 observations and 8 variables:
Gender of the patient, represented as a factor with 2 levels (Male, Female).
Age of the patient (numeric).
Number of annotated nodules (numeric).
Location of the nodule, represented as a factor with 6 levels.
Whether the nodule is spiculated, represented as a factor with 2 levels (Yes, No).
Smoking status of the patient, represented as a factor with 5 levels.
Diameter of the nodule (numeric).
Malignancy status of the nodule (numeric).
The dataset name has been kept as 'LungNodulesDetected_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Data taken from the R4HCR package. The dataset was collected from patients with pulmonary nodules detected on CT chest scans, aged 18 years or older, from 3 academic centers in the UK.
This dataset, MaleMiceCancer_df, is a data frame containing data on the occurrence of cancer in male mice. The dataset records the number of days until the occurrence of cancer under different treatment conditions. It includes 181 observations and 4 variables.
data(MaleMiceCancer_df)
data(MaleMiceCancer_df)
A data frame with 181 observations and 4 variables:
Treatment group: 1 = treatment, 2 = control (factor).
Number of days until the occurrence of cancer (numeric).
Cancer outcome: levels include 'none', 'localized', 'metastatic', and 'other' (factor).
Mouse identifier (integer).
The dataset name has been kept as 'MaleMiceCancer_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Data taken from the survival package.
This dataset, Melanoma_df, is a data frame containing information about 205 patients with malignant melanoma (a type of skin cancer) who underwent a radical operation at Odense University Hospital, Denmark, between 1962 and 1977. Patients were followed up until the end of 1977. By that time, 134 patients were still alive, and 71 had died (57 due to cancer and 14 from other causes). This dataset provides detailed clinical and demographic information for studying malignant melanoma outcomes.
data(Melanoma_df)
data(Melanoma_df)
A data frame with 205 observations and 7 variables:
Follow-up time in days (integer).
Patient's status at the end of the study: 1 = alive, 2 = dead from cancer, 3 = dead from other causes (integer).
Sex of the patient: 1 = male, 2 = female (integer).
Age of the patient at the time of surgery (integer).
Year of surgery (integer).
Tumor thickness in millimeters (numeric).
Presence of ulceration: 1 = no, 2 = yes (integer).
The dataset name has been kept as 'Melanoma_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Data taken from the MASS package. Original study conducted at Odense University Hospital, Denmark.
This dataset, MiceDeathRadiation_df, is a data frame containing data on deaths of RFM male mice exposed to 300 rads of x-radiation at 5–6 weeks of age. The dataset records the causes of death, which include thymic lymphoma, reticulum cell sarcoma, and other causes. Additionally, it distinguishes between mice kept in a conventional environment and those in a germ-free environment.
data(MiceDeathRadiation_df)
data(MiceDeathRadiation_df)
A data frame with 177 observations and 4 variables:
Type of environment (factor with 2 levels: conventional or germ-free).
Cause of death (factor with 3 levels: thymic lymphoma, reticulum cell sarcoma, or other).
Survival status (numeric).
Time to death in days (numeric).
The dataset name has been kept as 'MiceDeathRadiation_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Data taken from the SMPracticals package.
This dataset, NCCTGLungCancer_df, is a data frame containing data on survival in patients with advanced lung cancer from the North Central Cancer Treatment Group (NCCTG). The data includes 228 observations and 10 variables related to clinical and performance score data for lung cancer patients.
data(NCCTGLungCancer_df)
data(NCCTGLungCancer_df)
A data frame with 228 observations and 10 variables:
Institution code (numeric).
Survival time in days (numeric).
Survival status: 1 = dead, 2 = alive (numeric).
Age of the patient (numeric).
Sex of the patient: 1 = male, 2 = female (numeric).
ECOG performance score (numeric).
Karnofsky performance score (numeric).
Patient's Karnofsky performance score (numeric).
Daily calorie intake (numeric).
Weight loss in kilograms (numeric).
The dataset name has been kept as 'NCCTGLungCancer_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Data taken from the nftbart package. Based on survival data from patients with advanced lung cancer from the North Central Cancer Treatment Group (NCCTG). Performance scores rate how well the patient can perform usual daily activities.
This dataset, NodalProstate_df, is a data frame containing data on 53 patients diagnosed with prostate cancer. The dataset records several clinical and diagnostic factors to assess nodal involvement without surgery. Nodal involvement is a critical factor in determining the treatment strategy for prostate cancer patients.
data(NodalProstate_df)
data(NodalProstate_df)
A data frame with 53 observations and 7 variables:
Estimated probability of nodal involvement (numeric).
Predicted nodal involvement risk (numeric).
Age group of the patient (factor with 2 levels).
Cancer stage (factor with 2 levels).
Tumor grade (factor with 2 levels).
X-ray result (factor with 2 levels).
Acid phosphatase test result (factor with 2 levels).
The dataset name has been kept as 'NodalProstate_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Data taken from the SMPracticals package.
This package provides a wide variety of datasets related to cancer types such as melanoma, leukemia, breast, ovarian, and lung cancer, among others.
OncoDataSets: A Comprehensive Collection of Cancer Types and Cancer-related DataSets
A Comprehensive Collection of Cancer Types and Cancer-related DataSets.
Maintainer: Renzo Caceres Rossi [email protected]
Useful links:
This dataset, OvarianCancer_df, is a data frame containing survival data from a randomized trial comparing two treatments for ovarian cancer. It includes 26 observations and 6 variables related to patient demographics, treatment, and survival outcomes.
data(OvarianCancer_df)
data(OvarianCancer_df)
A data frame with 26 observations and 6 variables:
Follow-up time in days (numeric).
Survival status: 1 = deceased, 0 = alive (numeric).
Age of the patient in years (numeric).
Residual disease: size of the largest residual tumor in centimeters (numeric).
Treatment group: 1 = standard treatment, 2 = experimental treatment (numeric).
ECOG performance status score: 0 = fully active, 1 = restricted activity, 2 = unable to carry out work activities (numeric).
The dataset name has been kept as 'OvarianCancer_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Data taken from the survival package.
This dataset, PancreaticMiRNAs_tbl_df, is a tibble containing information from PubMed abstracts related to microRNAs (miRNAs) in pancreatic cancer. The data provides key details such as publication metadata, article abstracts, and associated miRNAs. The dataset consists of 381 observations with 8 variables.
data(PancreaticMiRNAs_tbl_df)
data(PancreaticMiRNAs_tbl_df)
A tibble with 381 observations and 8 variables:
PubMed Identifier (numeric).
Publication year of the article (numeric).
Title of the PubMed article (character).
Abstract of the article (character).
Language of the article (character).
Type of publication, e.g., review, study (character).
Research topic related to pancreatic cancer and miRNAs (character).
Specific microRNAs mentioned in the publication (character).
The dataset name has been kept as 'PancreaticMiRNAs_tbl_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_tbl_df' indicates that the dataset is a tibble, which is an enhanced version of a data frame in R. The original content has not been modified in any way.
Data taken from the miRetrieve package. More information is available at: https://pubmed.ncbi.nlm.nih.gov/
This dataset, ProstateMethylation_df, is a data frame containing pre-processed beta methylation values collected from two sample types (benign and tumor tissue) of 4 patients diagnosed with prostate cancer. The dataset can be used for analyses of methylation patterns in benign versus tumor tissues in prostate cancer cases.
data(ProstateMethylation_df)
data(ProstateMethylation_df)
A data frame with 5067 observations and 9 variables:
Unique identifier for the methylation probe (character).
Beta methylation value for benign tissue, patient 1 (numeric).
Beta methylation value for benign tissue, patient 2 (numeric).
Beta methylation value for benign tissue, patient 3 (numeric).
Beta methylation value for benign tissue, patient 4 (numeric).
Beta methylation value for tumor tissue, patient 1 (numeric).
Beta methylation value for tumor tissue, patient 2 (numeric).
Beta methylation value for tumor tissue, patient 3 (numeric).
Beta methylation value for tumor tissue, patient 4 (numeric).
The dataset name has been kept as ProstateMethylation_df to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Data taken from the betaclust package.
This dataset, ProstateSurgery_df, is a data frame containing data from a study on 97 men with prostate cancer who were scheduled to undergo radical prostatectomy. The dataset includes clinical and pathological variables associated with prostate cancer.
data(ProstateSurgery_df)
data(ProstateSurgery_df)
A data frame with 97 observations and 9 variables:
Logarithm of cancer volume (numeric).
Logarithm of prostate weight (numeric).
Patient's age in years (integer).
Logarithm of the amount of benign prostatic hyperplasia (numeric).
Seminal vesicle invasion (binary: 0 = No, 1 = Yes; integer).
Logarithm of capsular penetration (numeric).
Gleason score (integer).
Percentage of Gleason scores 4 or 5 (integer).
Logarithm of prostate-specific antigen (PSA) level (numeric).
The dataset name has been kept as 'ProstateSurgery_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Data taken from the faraway package.
This dataset, ProstateSurvival_df, is a data frame containing survival times for two competing causes: time from prostate cancer diagnosis to death from prostate cancer, and time from prostate cancer diagnosis to death from other causes. The data set also contains information on several risk factors. The data in this data set are simulated from detailed competing risk survival curves and counts of numbers of patients per group presented in Lu-Yao et al. (2009).
data(ProstateSurvival_df)
data(ProstateSurvival_df)
A data frame with 14,294 observations and 5 variables:
Cancer grade categorized into 2 levels (factor).
Cancer stage categorized into 3 levels (factor).
Age group categorized into 4 levels (factor).
Survival time in months from prostate cancer diagnosis (integer).
Event status: 1 for death from prostate cancer, 2 for death from other causes, 0 for censored (integer).
The dataset name has been kept as 'ProstateSurvival_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Data taken from the asaur package. Simulated data based on competing risk survival curves and patient counts presented in Lu-Yao et al. (2009): *Outcomes of localized prostate cancer following conservative management*. Journal of the American Medical Association, 302, 1202–1209.
This dataset, PSAProstateCancer_df, is a data frame containing data from a study by Stamey et al. (1989) to examine the association between prostate specific antigen (PSA) and several clinical measures in men about to receive a radical prostatectomy. The dataset includes 97 observations and 9 variables, each representing a factor potentially associated with PSA.
data(PSAProstateCancer_df)
data(PSAProstateCancer_df)
A data frame with 97 observations and 9 variables:
Logarithm of cancer volume (numeric).
Logarithm of prostate weight (numeric).
Age of the patient in years (integer).
Logarithm of benign prostatic hyperplasia (numeric).
Seminal vesicle invasion (integer).
Logarithm of cancer perineural invasion (numeric).
Gleason score (integer).
Percentage of cancerous tissue with Gleason score 4 or 5 (integer).
Logarithm of prostate specific antigen (PSA) (numeric).
The dataset name has been kept as 'PSAProstateCancer_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Data taken from the ncvreg package. Based on data from Stamey et al. (1989), which examined the association between prostate specific antigen (PSA) and several clinical measures potentially associated with PSA in men about to receive a radical prostatectomy.
This dataset, RadiationEffects_df, is a data frame containing data from an experiment conducted to examine the effects of gamma radiation on the number of chromosomal abnormalities observed. The data explores the relationships between radiation dose, dose rate, and chromosomal changes.
data(RadiationEffects_df)
data(RadiationEffects_df)
A data frame with 27 observations and 4 variables:
Number of cells observed (integer).
Number of chromosomal abnormalities (integer).
Amount of gamma radiation dose (numeric).
Rate of gamma radiation dose (numeric).
The dataset name has been kept as 'RadiationEffects_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Data taken from the faraway package. Based on the study by Purott R. and Reeder E. (1976): *The effect of changes in dose rate on the yield of chromosome aberrations in human lymphocytes exposed to gamma radiation*. Mutation Research, 35, 437–444.
This dataset, RotterdamBreastCancer_df, is a data frame containing data on 2982 patients with primary breast cancer. The data was collected as part of the Rotterdam tumor bank and was used in Royston and Altman (2013) for survival analysis and prognostic model evaluation.
data(RotterdamBreastCancer_df)
data(RotterdamBreastCancer_df)
A data frame with 2982 observations and 15 variables:
Patient ID (integer).
Year of diagnosis (integer).
Age at diagnosis in years (integer).
Menopausal status: 1 = premenopausal, 2 = postmenopausal (integer).
Tumor size categorized into three levels (factor).
Tumor grade: 1 = low, 2 = intermediate, 3 = high (integer).
Number of lymph nodes involved (integer).
Progesterone receptor status (integer).
Estrogen receptor status (integer).
Hormonal therapy: 1 = yes, 0 = no (integer).
Chemotherapy: 1 = yes, 0 = no (integer).
Time to recurrence in days (numeric).
Recurrence status: 1 = recurrence, 0 = no recurrence (integer).
Time to death in days (numeric).
Death status: 1 = deceased, 0 = alive (integer).
The dataset name has been kept as 'RotterdamBreastCancer_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Data taken from the survival package. Based on records from the Rotterdam tumor bank and used in Royston and Altman (2013) for survival analysis.
This dataset, SkinCancerChemo_df, is a data frame containing simulated data mimicking the Skin Cancer Chemoprevention Trial as used in Chiou et al. (2017). It records tumor recurrence in patients who were part of the trial, which includes information on patient demographics, prior tumors, and the treatment they received. The dataset consists of 894 observations with 7 variables.
data(SkinCancerChemo_df)
data(SkinCancerChemo_df)
A data frame with 894 observations and 7 variables:
Patient ID (numeric).
Time to event or censoring (numeric).
Number of tumor recurrences (numeric).
Age of the patient at the start of the trial (numeric).
Gender of the patient (1 = male, 0 = female) (numeric).
Indicates whether the patient received DFMO treatment (1 = yes, 0 = no) (numeric).
Number of prior tumors before the trial (numeric).
The dataset name has been kept as 'SkinCancerChemo_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Data taken from the spef package. This simulated dataset is based on the study by Chiou et al. (2017): *Marginal and conditional cumulative incidence functions in the presence of dependent censoring*. Biometrics, 73(2), 385–394.
This dataset, SmallCellLung_tbl_df, is a tibble containing information on the entry age and survival time of 121 patients diagnosed with small cell lung cancer (SCLC) under two different treatment regimens. The dataset provides key insights for survival analysis and treatment comparisons in patients with SCLC.
data(SmallCellLung_tbl_df)
data(SmallCellLung_tbl_df)
A tibble with 121 observations and 3 variables:
Treatment group of the patient (factor with 2 levels).
Entry age of the patient at the start of treatment (integer).
Survival time of the patient in days (integer).
The dataset name has been kept as 'SmallCellLung_tbl_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix 'tbl_df' indicates that the dataset is a tibble. The original content has not been modified in any way.
Data taken from the BSDA package. Originally published in: Ying, Z., Jung, S., Wei, L. 1995. Survival Analysis with Median Regression Models.
This dataset, SmokingLungCancer_df, is a data frame containing data on man-years of risk and observed number of lung cancer deaths among men. The data includes information about the years of smoking, pack-years, number of cigarettes smoked per day, and the number of deaths due to lung cancer.
data(SmokingLungCancer_df)
data(SmokingLungCancer_df)
A data frame with 63 observations and 4 variables:
Years of smoking, represented as a factor with 9 levels.
Pack-years of smoking (numeric).
Number of cigarettes smoked per day, represented as a factor with 7 levels.
Number of deaths due to lung cancer (numeric).
The dataset name has been kept as 'SmokingLungCancer_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Data taken from the R4HCR package. Data originally from Table 24-4, page 702 of Kleinbaum et al (1988).
This dataset, SuspectedCancer_df, is a data frame containing blood test results from individuals presenting with non-specific symptoms of cancer. The data was collected as part of the Suspected CANcer (SCAN) pathway, which evaluates a new standard of care for patients in primary care settings.
data(SuspectedCancer_df)
data(SuspectedCancer_df)
A data frame with 750 observations and 8 variables:
Age of the individual (numeric).
Comorbidity index (numeric).
Haemoglobin level (numeric).
Albumin level (numeric).
Alanine aminotransferase level (numeric).
White blood cell count (numeric).
Bilirubin level (numeric).
Calcium level (numeric).
The dataset name has been kept as 'SuspectedCancer_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Data taken from the R4HCR package. Nicholson BD, Oke JL, Friedemann Smith C, et al. The Suspected CANcer (SCAN) pathway: protocol for evaluating a new standard of care for patients with non-specific symptoms of cancer. BMJ Open 2018;8:e018168.
This dataset, UKLungCancerDeaths_df, is a data frame containing the number of deaths due to lung cancer among British male physicians. The data is categorized by years of smoking and cigarette consumption and was originally used in Frome (1983) to analyze rates using Poisson regression models.
data(UKLungCancerDeaths_df)
data(UKLungCancerDeaths_df)
A data frame with 63 observations and 4 variables:
Years of smoking categorized into 9 levels (factor).
Cigarette consumption categorized into 7 levels (factor).
Exposure time in person-years (numeric).
Number of lung cancer deaths (numeric).
The dataset name has been kept as 'UKLungCancerDeaths_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Data taken from the SMPracticals package. Based on the study by Frome, E. L. (1983): *The analysis of rates using Poisson regression models*. Biometrics, 39, 665–674.
This dataset, USCancerStats_df, is a data frame containing cancer statistics for 20 solid tumor types, including incidence, mortality, and survival data. The dataset reports the absolute difference in 5-year survival between 1989-1995 and 1950-1954, as well as the percentage change in mortality and incidence from 1950 to 1996.
data(USCancerStats_df)
data(USCancerStats_df)
A data frame with 20 observations and 4 variables:
Tumor site (character).
Absolute difference in 5-year survival (numeric).
Percentage change in mortality (numeric).
Percentage change in incidence (numeric).
The dataset name has been kept as 'USCancerStats_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Data taken from the R4HCR package.
This dataset, USMortalityCancer_df, is a data frame containing mortality rates across all ages in the USA (Nation-wide) by cause of death, sex, and rural/urban status, recorded from 2011 to 2013. It includes national aggregate rates and region-wise rates for each administrative region under the Department of Health and Human Services (HHS). The dataset consists of 40 observations with 5 variables.
data(USMortalityCancer_df)
data(USMortalityCancer_df)
A data frame with 40 observations and 5 variables:
Rural or urban status (factor with 2 levels).
Gender of the individual (factor with 2 levels).
Cause of death (factor with 10 levels).
Mortality rate (numeric).
Standard error of the mortality rate (numeric).
The dataset name has been kept as 'USMortalityCancer_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Data taken from the lattice package. This dataset is based on the study by the Rural Health Reform Policy Research Center: *Exploring Rural and Urban Mortality Differences*, August 2015, Bethesda, MD. Available at https://ruralhealth.und.edu/projects/health-reform-policy-research-center/rural-urban-mortality.
This dataset, USRegionalMortality_df, is a data frame containing mortality rates across all ages in the USA, recorded region-wise by cause of death, sex, and rural/urban status for the years 2011–2013. It includes region-wide rates for each administrative region under the Department of Health and Human Services (HHS). The dataset consists of 400 observations with 6 variables.
data(USRegionalMortality_df)
data(USRegionalMortality_df)
A data frame with 400 observations and 6 variables:
Administrative region under the Department of Health and Human Services (HHS) (factor with 10 levels).
Rural or urban status (factor with 2 levels).
Gender of the individual (factor with 2 levels).
Cause of death (factor with 10 levels).
Mortality rate (numeric).
Standard error of the mortality rate (numeric).
The dataset name has been kept as 'USRegionalMortality_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Data taken from the lattice package. This dataset is based on the study by the Rural Health Reform Policy Research Center: *Exploring Rural and Urban Mortality Differences*, August 2015, Bethesda, MD. Available at https://ruralhealth.und.edu/projects/health-reform-policy-research-center/rural-urban-mortality.
This dataset, VALungCancer_list, is a list containing two components: 'X' and 'y'. The data comes from a randomized trial of two treatment regimens for lung cancer. The 'X' component contains the covariates, and the 'y' component contains the survival time data. This dataset is typically used in survival analysis.
data(VALungCancer_list)
data(VALungCancer_list)
A list with 2 components:
A numeric matrix with 1137 rows and 19 columns, representing the covariates.
A numeric matrix with 1137 rows and 12 columns, representing the survival time data. The columns include 'time' for the survival time and other variables related to survival analysis.
The dataset name has been kept as 'VALungCancer_list' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_list' indicates that the dataset is a list. The original content has not been modified in any way.
Data taken from the ncvreg package. Based on data from a randomized trial of two treatment regimens for lung cancer, as presented in the classic textbook by Kalbfleisch and Prentice.
This dataset, VinylideneLiverCancer_df, is a data frame containing data from an experiment to investigate whether vinylidene fluoride induces liver damage. The dataset records the levels of three serum enzymes (SDH, SGOT, SGPT) under four different dosages of vinylidene fluoride. Increased serum enzyme levels are indicative of liver damage. Real data which are available on page 10 of Silvapulle and Sen (2005) and in a report prepared by Litton Bionetics Inc in 1984. These data were used in an experiment to find out whether vinylidene fluoride gives rise to liver damage.
data(VinylideneLiverCancer_df)
data(VinylideneLiverCancer_df)
A data frame with 40 observations and 4 variables:
Serum enzyme SDH levels (integer).
Serum enzyme SGOT levels (integer).
Serum enzyme SGPT levels (integer).
Dose of vinylidene fluoride administered (factor with 4 levels).
The dataset name has been kept as 'VinylideneLiverCancer_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The suffix '_df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Data taken from the goric package. Silvapulle MJ and Sen PK (2005). *Constrained Statistical Inference: Order, Inequality, and Shape Restrictions*. Wiley. Litton Bionetics Inc (1984). Report on the effects of vinylidene fluoride on liver enzymes in Fischer-344 rats.
This dataset, WBreastCancer_tbl_df, is a tibble containing data from a study among women with breast cancer. The dataset includes clinical and demographic variables for 1207 patients, providing valuable insights for breast cancer research and analysis.
data(WBreastCancer_tbl_df)
data(WBreastCancer_tbl_df)
A tibble with 1207 observations and 9 variables:
Unique identifier for each patient (numeric).
Time to the event or censoring (numeric).
Event status: 1 if the event occurred, 0 if censored (numeric).
Estrogen receptor status (numeric).
Age of the patient at the time of diagnosis (numeric).
Histological grade of the tumor (numeric).
Presence of lymph nodes: 1 if positive, 0 if negative (numeric).
Pathological stage of the disease (numeric).
Progesterone receptor status (numeric).
The dataset name has been kept as 'WBreastCancer_tbl_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the OncoDataSets package and assists users in identifying its specific characteristics. The original content has not been modified in any way.
Data taken from the psfmi package.