Dr. Saurabh Bhardwaj
Professor
Specialization
Statistical and machine learning based solutions for the problems which involves recognition, classification, clustering, modeling, estimation and information retrieval
saurabh.bhardwaj@thapar.edu
Statistical and machine learning based solutions for the problems which involves recognition, classification, clustering, modeling, estimation and information retrieval
https://sites.google.com/site/saurabhshomeonweb/
Biography
I have been teaching and conducting research for around 21 years. Currently, I am an Associate Professor in the Electrical and Instrumentation Engineering Department at Thapar Institute of Engineering and Technology, India. In addition, I hold the title of Research Scientist at Virginia Tech's Department of Electrical and Computer Engineering. My research interests include the creation and enhancement of statistical and machine learning-based solutions for issues including information retrieval, modeling, clustering, and recognition.
The area of bioinformatics research is where I'm currently employed. In this regard, I focused on unsupervised deconvolution of diverse datasets using Convex Analysis of Mixtures, usage of the cosine-based one-sample test (COT) to identify marker genes among distinct subtypes, and missing data imputation in proteomics data.
In a previous work, I investigated a range of machine learning and deep learning models for a number of applications, including speaker identification, brain fingerprinting, clustering, and prediction of solar radiation.
I am a Postdoc from Virginia Tech University in the United States and hold M.E. and Ph.D. degrees in instrumentation engineering from Panjab University in Chandigarh and the Netaji Subhas Institute of Technology, Delhi University in New Delhi, respectively.
Phone/Mobile Number |
+917009691106 +917528981415 |
Email ID |
saurabh.bhardwaj@thapar.edu |
Membership of Professional Institutions, Associations, Societies:
Awards and Honours:
Description of Research Interests
My research interests are focused on development and improvement of statistical and machine learning based solutions for the problems which involves recognition, classification, clustering, modeling, estimation and information retrieval.
Publications and other Research Outputs
International Journals:
International Conferences
EDUCATION AND RESEARCH
Apr 2020 - May 2020 (Offline, USA)
May 2020 - Nov 2021 (Online, India)
Nov 2021 - Apr 2022 (Offline, USA) |
Virginia Tech University, Virginia, USA The Bradley Department of Electrical & Computer Engineering Visiting Researcher (Computational Bioinformatics and Bio-Imaging Laboratory)
Research Domain: Bioinformatics
Projects:
1. Unsupervised deconvolution of different datasets using improved Convex Analysis of Mixtures (CAM) Aims: To detect latent sources, their respective source signature genes/metabolites, and their relevant concentration patterns between healthy control and patients for the following datasets: CAD GWAS metabolomic datasets (‘Mesa’ and ‘Rotterdam’), GPAA proteomics and GPAA RNAseq Methods: - Two-step data normalization using Trimmed Mean of M-values (TMM) and Tag Count Comparison (TCC) Based Normalization. - Metabolomic data deconvolution by CAM 3.0 - Data visualization through heatmaps and simplex plots Result: We successfully identified several sources within and between different groups of data and their corresponding distinct and common source signature genes/metabolites in different datasets.
2. Missing data imputation of different datasets
Aims: To impute the missing values for the Fresh frozen LAD45, AA paired Trizol data and for the LAD Trizol data. Further to do the comparative analysis of different missing data imputation methods.
Methods: - The various methods used for comparison are Mean, non-linear estimation by iterative partial least squares (NIPALS), singular value thresholding (SVT), and Fused Regularization Matrix factorization (FRMF).
3. Application of Cosine-based One-sample Test (COT) to detect marker genes among many subtypes
Aims: To enhance COT for downregulated signature genes (DSGs) along with upregulated expressed genes (SGs). COT based detection of both condition-specific SGs and DSGs for the following datasets: GPAA RNAseq, GPAA proteomics, Edinburgh
Methods: - Two-step data normalization using Trimmed Mean of M-values (TMM) and Tag Count Comparison (TCC) Based Normalization. - COT enhancement for downregulated expressed genes. - Data visualization through updated heatmaps for multi-groups and simplex plots
|
Feb 2020 – Apr 2020 (Offline, USA) |
Virginia Tech University, Virginia, USA The Bradley Department of Electrical & Computer Engineering Visiting Researcher (Deep Learning Research Laboratory)
Projects:
1. Integrating of information set theory with deep learning principle to Improve Noisy Pattern Classification
Aims: To address the critical problem of the absence of robust deep learning models, we propose Information-Set Deep learning (ISDL) architectures with four variants by integrating information set theory and deep learning principle.
Methods: - Infor-Set Based Convolutional Neural Network - Infor-Set based Deep Feedforward Network - Infor-Set based neural network layer - Infor-Set based Pooling Layer
Results: The experimental results show that the ISDL models can efficiently handle noise-dominated uncertainty and outperform peer architectures.
|
July 2013 |
Delhi University, New Delhi, India Netaji Subhas Institute Of Technology (NSIT) PhD (Instrumentation and Control Engineering)
§ Thesis: System Identification & Control using Hybrid Combination of Statistical & Soft-Computing Techniques
§ Projects:
1. Text Independent Speaker Identification Aims: The aim of this project is to develop robust speaker recognition models. For this I developed three approaches.
Methods:
In the first approach, the HMM is utilized for the extraction of pattern similarity-based batch feature vector that is fitted with the GFM to identify the speaker. In the second approach, the equivalence between the defuzzified output of the GFM and the conditional mean of the GMM under certain conditions is used for the identification of speakers. In this the parameters of the GFM are calculated with the help of GMM. Finally, the third method has been inspired by the way humans cash in on the mutual acquaintances while identifying a speaker. In this, several sets of HMM models are created with different initial parameters such as the number of states and the number of Gaussian mixtures.
2. Pattern Similarity Based Clustering Aims: Two-pattern similarity-based techniques have been developed for the clustering of data. It is shown that the distance functions are not always adequate for clustering of data and strong correlations may still exist among the data vectors even if they are far apart from each other
Methods: - Shape Based Batching - Shape Based Clustering
Results: This method is tested on real (Iris data and signature data) as well as on the synthetic dataset. The results of simulation are very encouraging; the method gives 100% accuracy on Iris dataset and signature dataset while about 99% accuracy is obtained on the synthetic data
3. Time Series Prediction Aims: To predict the chaotic time series data.
Methods: The underlying hidden properties of time series are captured with the help of HMM. The prediction method used here exploits the pattern identification prowess of the HMM for cluster selection in conjunction with the generalization and nonlinear modeling capabilities of soft-computing methods to predict the output of the system.
Result: Accurately predicted the following time series data: 1. Mackey-Glass time series 2. Sunspot Data Time Series 3. Laser Data time Series 4. Lorenz Data Time Series
4. Solar Radiation Estimation Aims: To use machine learning techniques for solar radiation estimation using meteorological data.
Methods: Two GFM based methods are used for the estimation of solar radiation. In the first method, pattern similarity-based clustering algorithm is utilized for the extraction of shape-based clusters from the input meteorological parameters and it is then processed by the GFM to estimate the solar radiation. The second model makes use of both the probability theory and fuzzy set theory for the estimation of radiation.
Results: We were able to estimate the short-term solar radiation from the data acquired from comprehensive weather monitoring station at Solar Energy Centre (Now National Institute of Solar Energy), Gurgaon, India.
|
March 2008 |
Panjab University, Chandigarh University Centre Of Instrumentation & Microelectronics M.Tech (Instrumentation)
Thesis: Intelligent Parking Aid System
The aim of the project is to provide a parking aid for guiding a motorist when parking vehicle so that motorist can safely & reliably bring vehicle to a stop with a predetermined distance. A hardware was developed which calculated the distance of the obstacle from the front and back side and display it in front of the driver
Key Achievements: - Secured 81% marks (2nd position) - Achieved 1st position in oral presentation of a research paper entitled, “Monitoring of Environment Using Fuzzy Logic in Symposium on Electronics Technology – 2008” - Achieved 1st position in oral presentation of a research paper entitled, “Fuzzy Logic-based Virtual Instrument for Environmental Monitoring in 2nd Chandigarh Science Congress-2008” • Achieved 2nd position in oral presentation of a research paper entitled, “Ultrasonic Parking aid device,” in Ist phase of Student Research Convention, ANVESHAN-2008
|
June 2001 |
V.B.S. Purvanchal University, Jaunpur Meerut Institute Of Engineering & Technology B. Tech (Electronics and Instrumentation)
Thesis Title: Micro-Controller Based Moving Message Display |