Mathematics, Statistics & Physics
http://hdl.handle.net/10576/3082
2024-08-15T03:37:43ZCOMPETING RISK MODELS IN PRESENCE OF PROGRESSIVELY TYPE-II CENSORED DATA FOR DAGUM DISTRIBUTION
http://hdl.handle.net/10576/56270
COMPETING RISK MODELS IN PRESENCE OF PROGRESSIVELY TYPE-II CENSORED DATA FOR DAGUM DISTRIBUTION
BADWAN, RAGHD Y. H.
In the survival time analysis, there could be more than one cause of failure for an individual or an item. Usually, researchers are interested in survival times under a certain cause of failure, considering the rest of the causes as "other". Moreover, censoring is unavoidable in survival analysis, due to the time and money limitations where researchers are unable to get comprehensive information for the entire units in the experiment. "Progressive Type-II censoring" is considered in this thesis of the problem of competing risks under the Dagum or "Burr Type-III" distribution. "Maximum likelihood" estimation is applied to estimate the unknown shape parameters under the general case as well as the special case of a common shape parameter. Moreover, the observed "Fisher information matrix" is found to get the approximate CI for the unknown parameters. The bootstrap CI is also studied using the resampling method. Furthermore, adequacy of the proposed methods are assessed using "Monte Carlo simulation" followed by an analysis of a real dataset.
2024-06-01T00:00:00ZON THE GAUSSIAN PROCESS FOR STATIONARY AND NON-STATIONARY TIME SERIES PREDICTION FOR THE QATAR STOCK MARKET
http://hdl.handle.net/10576/56261
ON THE GAUSSIAN PROCESS FOR STATIONARY AND NON-STATIONARY TIME SERIES PREDICTION FOR THE QATAR STOCK MARKET
AL FAKIH, BATOUL MOHAMAD KAZEM
This research adopts a Gaussian prediction model for non-stationary time series. Then, we discuss four transformation techniques: Generalized Optimal Wavelet Decomposition Algorithm (GOWDA), Hilbert Huang transform (HHT), Detrending based on Echo State Networks (DESN), and Kolmogorov-Zurbenco (KZ) filter. GOWDA is an algorithm that runs the continuous wavelet transform (CWT) several times using different mother wavelet functions, maximal levels, and thresholding techniques. It chooses a combination with minimal error. Meanwhile, HHT combines echo state networks (ESNs), which decompose the time series into intrinsic mode functions (IMFs). Then, the Hilbert spectral analysis is applied to the IMFs before reconstructing the denoised signal. DESN is a neural network algorithm with minimal assumptions. KZ filter is a moving average algorithm that is easy to understand and implement. When comparing the performance of these methods with the Gaussian prediction model, it was found that the HHT reconstructed before prediction gave the best results.
2024-06-01T00:00:00ZAssessment and Prediction of Body Fat Composition Using A Variety of Machine Learning Algorithms
http://hdl.handle.net/10576/48144
Assessment and Prediction of Body Fat Composition Using A Variety of Machine Learning Algorithms
Shajahan, Tahsin Raahila
Body composition is critical for health outcomes and has been researched in various populations and conditions like obesity, diabetes, and many more. Qatar Biobank collected anthropometric and biomedical data from individuals across all age groups. Body fat and lean mass are important measures of body composition which help identify several health risks including cardiovascular health and nutrition. Machine learning (ML) algorithms in Python were used to predict Total Fat Percentage (TFP) and Total Lean Mass (TLM).
All the variables in the dataset are used to test different ML algorithms on the TFP variable. Based on performance metrics like R2, Mean Absolute Error and Root Mean Square Error; linear regression, support vector regression (SVR) and extreme gradient boosting (XGBoost) performed well. Subsequently, further analysis on these models were performed using feature selection methods like forward, backward, stepwise and information gain for multiple cross-validation (CV) levels. We found that backward selection with a 10-CV on the SVR model predicted TFP the best with R2 of 86.7% (train), R2 of 80.2% (test) and MAE of 0.025 (train), MAE of 0.030 (test). Some of the best variables selected via this model are: testosterone, urea, gender, body mass index (BMI) and bone mineral density (BMD)
Next, TLM is analyzed using the three models that were selected earlier for TFP. It was found that linear regression and SVR models predicted TLM well, while XGBoost performed poorly. Since backward selection with 10-CV produced good results for TFP, the same is applied to the models for feature selection. Based on the results obtained we conclude that linear regression model after feature section predicts TLM the best with R2 of 83.7% (train), R2 of 82.9% (test) and MAE of 0.313 (train), MAE of 0.313 (test). Some of the best variables explaining TLM are: gender, age, BMI, cholesterol and BMD.
2023-06-01T00:00:00ZVOLATILITY ESTIMATION IN MISSING AT RANDOM HIGH-FREQUENCY FINANCIAL TIME SERIES
http://hdl.handle.net/10576/47659
VOLATILITY ESTIMATION IN MISSING AT RANDOM HIGH-FREQUENCY FINANCIAL TIME SERIES
ACHAIBOU, FERIEL
More than 15 years ago, the capital markets have seen significant development, introducing high-frequency trading and a shift of market towards high-frequency and algorithm trading. It was always believed that high-frequency trading and automated trading were source price shocks and rising of volatility. Therefore, more interest was recently given in modeling the volatility with high-frequency financial data. However, financial data can still be missing despite modern technology that allows data collection on a very fine time scale. Thus, this thesis focuses on the estimation of regression and volatility functions based on missing data using a nonparametric heteroscedastic regression model. A Nadaraya-Watson type estimator is used when the response variable is a real-valued random variable and subject to missing at random mechanism, while the predictor is a completely observed infinite-dimensional (functional) random variable. Based on the observed data, we first introduce a simplified, as well as inverse probability weighted, estimators. Second, these initial estimators are used to impute missing values and define estimators of the regression and volatility operators based on imputed data. Third, the performance of the proposed estimators is assessed using simulated data. Finally, an application to the estimation and forecasting of the daily volatility of Brent Oil Price returns conditionally to 1-minute frequency daily Natural Gas returns curves is also investigated.
2023-06-01T00:00:00Z