Feature Engineering for Climate Temperature Prediction in Indian Geographical Zones Using Machine Learning Regression Models

Saima Khan, Rakesh Singh Rajput

PDF

Published: Apr 23, 2026

Saima Khan, Rakesh Singh Rajput

Abstract

Climate changes prediction is a significant field of research due to global warming across heterogeneous geographic zones. In proposed study aimed to design a smarter way to predict climate temperature in Indian metro cities. First, the daily record of cities for more than two decades, organized by month, day, year, and the actual temperature (koggle) is loaded. As a preprocessing, it builds new features using things like lagged temperatures and moving averages. That helps the model catch both short-term changes and the slower, bigger swings in temperature. Then method weeds out statistical outliers, and splits the data into training and testing sets, and normalizes everything to help the models learn better. The prediction problem is set up as supervised bagged ensemble regression model to predict the next temperature based on the features. Basically, it trains a bunch of decision trees on different random subsets of the data, then averages their results. This keeps predictions stable and less prone to the weird quirks of any one tree. To see how well this works, the study compares the bagged trees performance to classic linear regression (LR), support vector regression (SVR) kernel, and a simple decision tree. Work use standard metrics like Root Mean Square Error (RMSE), Mean Approximate Error (MAE), and R² to measure performance. By the numbers, the ensemble easily beats the other models in Kolkata (RMSE 1.6899, MAE 1.2798, R² 0.9514) and Delhi (RMSE 2.0834, MAE 1.5503, R² 0.9749). It’s clearly ahead of SVR, decision trees, and especially linear regression. For Chennai, they had to convert the temperatures from Fahrenheit to Celsius first. The model’s predictions match observed highs (32–35°C), lows (23–25°C), and averages (27–30°C) pretty closely. But there’s a twist: the reported results show a negative R² (−0.0108) for both linear regression and the ensemble, while SVR nails it with an R² of 0.9344. So, for Chennai at least, SVR comes out on top. Thus, proposed bagged ensembles improve predictive accuracy and generalization for CC-relevant temperature forecasting.

Issue

Vol. 16 No. 2 (2026)

Section

Articles

Announcement

Call for Papers for the New Issue.
Last Date of Submission: April 30^th, 2026

Information for Authors

Dear Readers, Researchers, and Subscribers,

We would like to inform you that the only authentic and official website of the Journal of Chemical Health Risks (JCHR) is www.jchr.org. We have recently noticed that there are several websites claiming to represent the Journal of Chemical Health Risks, but please be aware that these websites are unauthorized and potentially fraudulent. For any inquiries, subscriptions, or submissions to the Journal of Chemical Health Risks, please always refer to www.jchr.org. If you have any doubts or come across any suspicious websites or platforms claiming to be affiliated with us, please do not hesitate to contact our official support team through the website for verification.