How it Works

1. Initial Data and Purpose

The model is based on light curve data from exoplanets obtained from missions like Kepler and TESS. The light curve shows how a star’s brightness changes over time. When a planet passes in front of its star from our perspective, it causes a small decrease in brightness, creating a dip in the light curve. This phenomenon is used to identify potential exoplanets.

2. Metadata and Data Preprocessing

To train a machine learning model, it’s necessary to have structured and processed data. In this case, the following essential data is gathered for each star to be analyzed:

Source identification: Includes the star identifier, such as KIC for Kepler or TIC for TESS.
Light curve: Data about relative luminosity (flux), errors in measurements (flux_err), and other metadata associated with each light curve.
Astrophysical parameters: Such as orbital period, transit epoch, relative radius of the planet, and orbital inclination.

The preprocessing of this data involves several stages, such as:

Downloading the light curves: Associated with the star’s KIC/TIC identifier.
Cleaning the signal: To remove noise, like “commercials” (slow variations) and blurry scenes (unrelated transients).
Temporal alignment: Adjusting the light curves according to the orbital period and transit epoch so that all signals related to the transit align and are clearly visible.

Generation of two views:

Global view (panorama): A summary of the light curve of the star over the entire orbit.
Local view (zoom): A window focused only on the transit event.

3. Machine Learning Model

With preprocessed data, the machine learning model is trained using a set of convolutional neural networks (CNNs), which are highly effective for processing and analyzing images or time-sequence data. The model takes two "numerical images," corresponding to the global and local views of the light curve, and looks for patterns within them that indicate the presence of a planetary transit.

Training steps include:

Inputs: Two inputs are defined for the model: one for the global view and another for the local view.
Convolutional branch: Each input passes through a convolutional branch that looks for patterns within the temporal data. Several convolutional layers (Conv1D) are used to extract features, followed by MaxPool1D layers to reduce dimensionality.
Merging the branches: The outputs of both branches (global and local) are combined to form a richer representation of the light curve.
Transit prediction: A Dense layer with a sigmoid activation function generates a probability indicating the presence of a planetary transit.
Training and Optimization: The model is trained using the Adam optimization algorithm and the binary_crossentropy loss function.

4. Model Evaluation

During training, metrics such as AUC (Area Under the ROC Curve), Precision, and Recall are used to evaluate the model's performance. Precision measures how many of the positive predictions are actually correct, while recall measures how many of the true events (transits) the model successfully detects.

Additionally, the decision threshold is adjusted to optimize the balance between precision and recall. Depending on the project's objectives, recall can be prioritized to ensure fewer exoplanets are missed, although this might increase the number of false positives.

5. Model Application

Once trained and validated, the model can be used to predict the probability that new light curves correspond to a planetary transit. These models can process large amounts of data from missions like Kepler and TESS, providing an automated and efficient analysis.

6. Considerations for Identifying Exoplanets

The model doesn’t just rely on the presence of a dip in brightness to identify a transit, but also on additional characteristics that help differentiate between real transits and false positives, such as:

Shape of the curve: A true transit generally exhibits a U or V shape, while other phenomena like eclipsing binary stars may produce larger dips.
Duration of the transit: Exoplanet transits typically last a few hours to a day.
Symmetry and center shift: The shape and symmetry of the curve are indicative of a planetary transit.