Feature Engineering for Climate Temperature Prediction in Indian Geographical Zones Using Machine Learning Regression Models
Main Article Content
Abstract
Climate changes prediction is a significant field of research due to global warming across heterogeneous geographic zones. In proposed study aimed to design a smarter way to predict climate temperature in Indian metro cities. First, the daily record of cities for more than two decades, organized by month, day, year, and the actual temperature (koggle) is loaded. As a preprocessing, it builds new features using things like lagged temperatures and moving averages. That helps the model catch both short-term changes and the slower, bigger swings in temperature. Then method weeds out statistical outliers, and splits the data into training and testing sets, and normalizes everything to help the models learn better. The prediction problem is set up as supervised bagged ensemble regression model to predict the next temperature based on the features. Basically, it trains a bunch of decision trees on different random subsets of the data, then averages their results. This keeps predictions stable and less prone to the weird quirks of any one tree. To see how well this works, the study compares the bagged trees performance to classic linear regression (LR), support vector regression (SVR) kernel, and a simple decision tree. Work use standard metrics like Root Mean Square Error (RMSE), Mean Approximate Error (MAE), and R² to measure performance. By the numbers, the ensemble easily beats the other models in Kolkata (RMSE 1.6899, MAE 1.2798, R² 0.9514) and Delhi (RMSE 2.0834, MAE 1.5503, R² 0.9749). It’s clearly ahead of SVR, decision trees, and especially linear regression. For Chennai, they had to convert the temperatures from Fahrenheit to Celsius first. The model’s predictions match observed highs (32–35°C), lows (23–25°C), and averages (27–30°C) pretty closely. But there’s a twist: the reported results show a negative R² (−0.0108) for both linear regression and the ensemble, while SVR nails it with an R² of 0.9344. So, for Chennai at least, SVR comes out on top. Thus, proposed bagged ensembles improve predictive accuracy and generalization for CC-relevant temperature forecasting.