41 Google Summer of Code 2026

GSoC 2026

41.1 Timeline

Date/Period	Event
January 19 - 18:00 UTC	Mentoring organizations can begin submitting applications to Google
February 3 - 18:00 UTC	Mentoring organization application deadline
February 3 - 18	Google program administrators review organization applications
February 19 - 18:00 UTC	List of accepted mentoring organizations published
February 19 - March 15	Potential GSoC contributors discuss application ideas with mentoring organizations
March 16 - 18:00 UTC	GSoC contributor application period begins
March 31 - 18:00 UTC	GSoC contributor application deadline
April 21 - 18:00 UTC	GSoC contributor proposal rankings due from Org Admins
April 30 - 18:00 UTC	Accepted GSoC contributor projects announced
May 1 - 24	Community Bonding Period: GSoC contributors get to know mentors, read documentation, get up to speed to begin working on their projects
May 25	Coding officially begins!
July 6 - 18:00 UTC	Mentors and GSoC contributors can begin submitting midterm evaluations (for standard 12 week coding projects)
July 10 - 18:00 UTC	Midterm evaluation deadline (standard coding period)
July 6 - August 16	Work Period: GSoC contributors work on their project with guidance from Mentors
August 17 - 24 - 18:00 UTC	Final week: GSoC contributors submit their final work product and their final mentor evaluation (standard coding period)
August 24 - 31 - 18:00 UTC	Mentors submit final GSoC contributor evaluations (standard coding period)
August 24 - November 2	GSoC contributors with extended timelines continue coding
November 2 - 18:00 UTC	Final date for all GSoC contributors to submit their final work product and final evaluation
November 9 - 18:00 UTC	Final date for mentors to submit evaluations for GSoC contributor projects with extended deadlines*

41.2 Number of Allowed Applications

There is a maximum of 3 applications (or proposals) allowed per contributor. You can have multiple applications to the same organization, but you cannot have more than 3 applications in total.

41.3 Stipened

For both Egypt and Saudi Arabia, the stipened is

Project Size	Stipend (USD)
Small	$750
Medium	$1500
Large	$3000

41.4 Organizations

Full List

What follows are the list of organizations and projects that I am interested in and are relevant to my previous experience and interests.

41.5 Machine learning for Science (ML4SCI)

Home Page
Home Page for GSoC 2026
Full List of Proposals
Divided into multiple projects and each project has multiple proposals
Number of Proposals: 67

What follows are the projects that I am interested in and each contains the list of proposals of interest under it.

41.5.1 New Updates !!!

Any application to any on of the projects under ML4SCI requires the submission of

CV
Proposal
Test Task Solutions - GitHub Repository

The surprise is the proposal. This is the same as the you will submit in the end to the GSoC platform when applying.

Applications shall be made through this form first.

41.5.2 EXXA

Link

This is one of the projects that are relevant to my previous experience (NeurIPS Ariel Data Challenge 2025). It contains 5 projects.

EXXA1: Equivariant Vision Networks for Predicting Planetary Systems’ Architectures
EXXA2: Denoising Astronomical Observations of Protoplanetary Disks
EXXA3: Exoplanet Atmosphere Characterization
EXXA4: Foundation Models for Exoplanet Characterization
EXXA5: Quantum Machine Learning for Exoplanet Characterization

All of these projects are of medium (175 hours) or large (350 hours) size.

I possess all necessary requirements for all of these projects: Python, PyTorch, C/Fortran. Background in astronomy is a bonus but not a requirement.

41.5.2.1 Description

The purpose of EXXA is to use simulations and publicly available data from observations intended to identify exoplanets and physical processes in planet-forming environments.

41.5.2.2 Test

To apply for any of the projects here, you should provide them with your submission for the required test.

Test Description

There is a total of 3 tests. Depending on the project, you will sumbit 2 or 3 of them.

Project	General Test	Image-Based Test	Sequential Test
EXXA1	Yes	Yes	No
EXXA2	Yes	Yes	No
EXXA3	Yes	No	Yes
EXXA4	Yes	Yes	Yes
EXXA5	Yes	No	Yes

41.5.2.2.1 The General Test

This test is required for all the projects. It is a general test that covers all the topics related to exoplanet characterization.

Given synthetic observations of ALMA, create an unsupervised ML model capable of unsupervised clustering of proplanetary disks (the sites of planet formation). You should create a full pipeline that includes data loading, preprocessing, model loading, clustering, and visualization.

There are many ways in which the data can be clustered, but the number of planets/presence of any planets is of particular interest. Beware of simply clustering the disks by viewing angle.

The deliverable is a Jupyter notebook (Google Colab) that contains the full pipeline.

Models will be judged on the clarity of clusters produced and the properties that the clusters find. The judges have all data pertaining to the disks and will run analyses of which properties are the most important in determining the clusters. Ideally, the clusters will correspond to properties pertaining to planets.

The quality of the cluster presentation and labeling will be taken into account. A clear presentation and data labeling that facilitates easy study will be judged highly.

41.5.2.2.2 The Image-Based Test

Required for EXXA1, EXXA2, and EXXA4.

Using the same data as the general test, train an autoencoder to output the images resembling the inputs. You’re free to use any architecture, but there must be an accessible latent space.

The deliverables are the same as the general.

The quantative evaluation will be the MSE between the input and output images and the multiscale SSIM between the input and output images.

The qualitative evaluation will be the same as the general test.

The model will be tested on withheld data.

41.5.2.2.3 The Sequential Test

Used for EXXA3, EXXA4, and EXXA5.

Create a simulated dataset of transit curves. Include as many physical and system parameters that you think are necessary. Feel free to supplement the synthetic data with observational data. Use this data to train a classifier that determines whether or not a given transit curve shows the presence of a planet.

The deliverables are the same as the general test.

The quantative evaluation will be the ROC curves and calculated AUC for the model on a withheld test set. The testing data will include real observations so take into account that noisy data will be used to judge.

41.5.2.3 EXXA1: Equivariant Vision Networks for Predicting Planetary Systems’ Architectures

Predicting the number of planets in observed systems from astronomical data using equivariant computer vision networks.

41.5.2.3.1 Description

The architecture of planetary systems, including the number of planets and their orbital configurations, provides crucial insights into their formation and evolution. This project aims to leverage the capabilities of equivariant computer vision networks to predict the number of planets in observed systems from astronomical data. Equivariant networks, due to their ability to handle rotational and reflectional symmetries inherent in astronomical images, offer a promising approach for analyzing spatial data without loss of predictive accuracy due to orientation changes. By regressing on the number of planets, this project seeks to develop a robust model that can adapt to the complexities of observational data, including direct images, transit data, and radial velocity measurements.

41.5.2.3.2 Task Ideas

Review and implement state-of-the-art equivariant neural network architectures suitable for astronomical data analysis.
Curate a dataset from existing astronomical surveys, including labeled systems with known numbers of planets, for training and testing the model.
Train the equivariant network on the curated dataset, optimizing for accurate regression on the number of planets in a system.
Evaluate the model’s performance using a separate test set, focusing on its ability to generalize across different types of planetary systems and observational techniques.
Explore the integration of additional data modalities (e.g., spectroscopic data) to improve the model’s predictive capabilities.

41.5.2.3.3 Expected Results

A highly accurate equivariant computer vision model capable of regressing on the number of planets in observed systems, accounting for the complexities and variabilities in astronomical data.
A comprehensive evaluation of the model’s performance, highlighting its strengths and potential areas for improvement.
Documentation and guidelines for applying the model to new datasets, facilitating further research and potential real-world applications in exoplanet discovery and characterization.

41.5.2.4 EXXA2: Denoising Astronomical Observations of Protoplanetary Disks

Basically, experiment with difussion models for denoising astronomical observations of protoplanetary disks.

41.5.2.4.1 Description

Recent advancements in observational astronomy have given the field the ability to resolve protoplanetary disks, the sites of planet formation, in unprecendeted detail. Array telescopes, such as ALMA and VLT, produce data that have revolutionized the study of these environments, spurring a rapid increase in the number of observations, significant advancements in theoretical understandings of planet formation processes, and the need for more efficient and accurate data processing. Traditional data processing algorithms, while advanced and powerful, are often time-consuming, computationally expensive, and can still produce noisy results. State-of-the-art machine learning algorithms, such as diffusion networks, are well-suited to this task and are a prime candidate for implementation in the field of protoplanetary disk astronomy. The purpose of this project is to develop machine learning algorithms to create a pipeline that denoises observational data more quickly and to a greater extent than current methods.

41.5.2.4.2 Task Ideas

Use synthetic observations of protoplanetary disks created using hydrodynamic simulations and radiative transfer to train machine learning models capable of denoising observational data.
Investigate and select suitable machine learning denoising models that can handle the complexity and heterogeneity of the data.
Develop a training pipeline that includes data augmentation techniques to enrich the training dataset and improve model robustness.
Implement the model and train it on the prepared dataset, optimizing for the ability to reproduce the raw synthetic observations.
Generalize the model to other types of observations, including line emission data and observations from other telescopes.
Validate the model’s performance on real observational data from ALMA and VLT, comparing the performance to traditional methods.

41.5.2.4.3 Expected Results

A machine learning denoising model tailored for removing noise from astronomical observations, leveraging the unique characteristics of observational data.
A detailed analysis of the model’s performance in removing noise from the data, including comparisons to traditional data processing methods and real observational data.
A publicly available dataset curated for training and testing the model, accompanied by a comprehensive data preprocessing and augmentation pipeline.
Documentation outlining the model architecture, training process, and guidelines for application to new datasets, ensuring reproducibility and facilitating future research in the field.

41.5.2.5 EXXA3: Exoplanet Atmosphere Characterization

Develop machine learning models to analyze spectral data from exoplanets, identifying chemical abundances, cloud/haze structure and different atmospheric processes.

Seems to be the closest to the Ariel Data Challenge. One mentor has previous experience in the Ariel Data Challenge and his students won it in 2022.

41.5.2.5.1 Description

The characterization of exoplanet atmospheres is crucial for understanding their compositions, weather patterns, and potential habitability. This project aims to develop machine learning models to analyze spectral data from exoplanets, identifying chemical abundances, cloud/haze structure and different atmospheric processes . The project will leverage data from telescopes and space missions, along with simulations of exoplanetary atmospheres under various conditions, to train and validate the models.

41.5.2.5.2 Task Ideas

Perform simulations of exoplanetary atmospheres with diverse atmospheric conditions: non-isothermal atmospheres; chemical equilibrium/disequilibrium; dawn/dusk asymmetry; distinct weather patterns; cloud/haze coverage etc.
Train machine learning models on simulated spectral data to recognize different atmospheric conditions and physical processes using transmission and/or emission spectroscopy.
Develop a ML strategy for searching of potential biosignatures in spectroscopic observations.
Apply the trained models to real observational data from missions like Hubble, JWST, and future telescopes to characterize exoplanet atmospheres.
Explore the use of deep learning techniques for enhancing the models’ ability to identify subtle spectral signatures associated with different atmospheric processes.

41.5.2.5.3 Expected Results

A set of machine learning models capable of accurately characterizing exoplanet atmospheres.
Analysis of the models’ performance on observational data, demonstrating their applicability to current and future exoplanet studies.

41.5.2.5.4 Mentors

They include Konstantin Matchev (University of Alabama), who has previous experience in the Ariel Data Challenge and his students won it in 2022.

41.5.2.6 EXXA4: Foundation Models for Exoplanet Characterization

This is by the most complex project here. It requires the submission of all three tests.

It is basically about creating foundation models for exoplanet characterization. These models will use both image data of disks and spectral data from exoplanets to identify forming exoplanets, processes and substructures that are important in protoplanetary disk evolution, chemical abundances in exoplanet atmosphers, cloud/haze structure, and different atmospheric processes.

I plan to avoid it.

It’s mentioned here because I did consider it.

41.5.2.6.1 Description

Advancing the understanding of exoplanets and planet formation requires a wide variety of observational methods and data modalities. Planet formation is a complex process that involves the assembly of a planet from a protoplanetary disk, an environment that instruments have only recently been able to resolve. These observations rely mostly on image data, including line emission and continuum data. The analysis of this data is a complex process, but, when done successfully, it opens new avenues for understanding planet formation, the resulting systems of exoplanets, and the potential of these systems for habitability. A complementary route is to use data from the atmospheres of exoplanets. The characterization of exoplanet atmospheres is crucial for understanding their compositions, weather patterns, and potential habitability. This project aims to develop a foundation machine learning models that will analyze data of different environments from different instruments to further our understanding of planet formation, extoplanet systems, exoplanet properties, and, ultimately, the potential of these systems for habitability. The models will use image data of disks, spectral data from exoplanets, identifying forming exoplanets, processes and substructures that are important in protoplanetary disk evolution, chemical abundances in exoplanet atmosphers, cloud/haze structure, and different atmospheric processes. The project will leverage data from telescopes and space missions, along with simulations of protoplanetary disks and exoplanetary atmospheres under various conditions, to train and validate the models.

41.5.2.6.2 Task Ideas

Assemble a consolidated database using existing protoplanetary disk and exoplanet transit observations from different instruments, spectral resolutions, and spectral ranges from publicly available archives.
Develop an ML approach to overcome the specific instrumental differences for the different observations. Training can be done on existing synthetic databases simulating the instrument performance (Hubble Space Telescope, JWST, ALMA, Ariel etc.)
Apply the trained models to real observational data from Hubble, JWST, ALMA, and future telescopes to characterize protoplanetary disks and exoplanet atmospheres.
Explore the use of different ML architectures for enhancing the models’ ability to identify subtle signatures in the different data modalities associated with important physical properties and processes that may influence the formation and identification of habitable systems.

41.5.2.6.3 Expected Results

A set of machine learning models capable of accurately characterizing protoplanetary disks and exoplanet atmospheres using inputs from different observations.
Analysis of the models’ performance on observational data, demonstrating their applicability to current and future exoplanet studies.

41.5.2.6.4 Mentors

They include Konstantin Matchev (University of Alabama), who has previous experience in the Ariel Data Challenge and his students won it in 2022.

41.5.3 Other Projects

Other interesting projects are DeepFalcon and GENIE.

Weirdly, they don’t have any test task, but you have to submit a test task solution to apply for them. You need to email the mentors before applying to know about the test task.

Their proposals rely on either diffusion models or GNNs. I am planning to dive deeper into both topics. I shall leave the decision of applying to either of them until I have a better understanding of both topics.