VTXL EDA 

Initial EDA

When conducting exploratory data analysis (EDA), an effective workflow involves several crucial steps to ensure a comprehensive understanding of the dataset. Initially, the process begins with data collection and cleaning, where any missing or erroneous values are identified and addressed. This stage also involves checking for inconsistencies, duplicates, and outliers that could potentially skew the analysis. Once the data is clean and well-structured, the next step is to summarize and visualize the data using descriptive statistics, such as means, medians, and standard deviations, as well as graphical representations like histograms, box plots, and scatter plots. These techniques help to reveal underlying patterns, trends, and relationships within the data, allowing analysts to formulate hypotheses and identify areas for further investigation. Additionally, EDA often involves examining correlations between variables to detect possible dependencies or multicollinearity. Throughout the entire process, documentation is crucial to ensure replicability and maintain a clear understanding of the steps taken and any observations made. Ultimately, the insights gained during EDA lay the foundation for more advanced statistical modeling, hypothesis testing, and informed decision-making.

Power And Cadence 

In this analysis, we will examine the differences in hues between two graphs, exploring the relationships between speed and heart rate in cycling. From a domain standpoint, there should not be an inverse relationship between these factors. We will conduct a multiple regression analysis to investigate the relationship further.  

Do note that heart rate can be considered both dependent and independent in cycling. As an independent variable, it is a physiological response influenced by factors like age, sex, fitness level, and temperature. These factors can affect heart rate even if the level of exertion or power output remains constant. Conversely, heart rate can be viewed as a dependent variable, used to measure exercise intensity in cycling to ensure that a cyclist is training within a desired intensity range, such as for aerobic or anaerobic training. 

We preformed a multiple regression analysis to see the relatioship between speed and heart rate. The regression results display two models for speed and heart rate, with power, cadence, and their interaction term (power:cadence) as independent variables. In the speed model, the R-squared value is 0.279, indicating that approximately 27.9% of speed variability can be explained by the independent variables. All independent variables have statistically significant coefficients (p < 0.05). The power has a negative coefficient (-0.1075), while the cadence has a positive coefficient (0.1334). The interaction term (power:cadence) has a positive coefficient (0.0003), suggesting a positive combined effect of power and cadence on speed.  In the heart rate model, the R-squared value is 0.319, meaning that approximately 31.9% of heart rate variability can be explained by the independent variables. All independent variables also have statistically significant coefficients (p < 0.05). The power has a positive coefficient (0.0611), while the cadence has a negative coefficient (-0.0539). The interaction term (power:cadence) has a positive coefficient (0.0003), indicating a positive combined effect of power and cadence on heart rate.  

These results reveal that power and cadence have distinct effects on speed and heart rate. Additionally, their interaction term positively impacts both dependent variables. However, we do not observe an inverse relationship between the speed and heart rate models based on power and cadence.

Heart Rate Over Time

A common model used for assessing heart rate is linear regression. This model unfortunately  assumes a linear relationship between heart rate and distance or time. The coefficients of the linear model can be estimated using least squares regression, and the goodness of fit can be assessed using metrics such as R-squared and root mean square error (RMSE).  

In the context of time series data, it is important to adjust the model to account for the temporal dependencies and potential autocorrelation present in the data. Time series data often exhibit patterns such as trends, seasonality, and noise, which can lead to inaccurate predictions if not properly accounted for.

In summary, when analyzing heart rate data in relation to endurance activities, it is crucial to consider both the inherent non-linearities and the time-dependent nature of the data. By employing appropriate time series models and accounting for additional variables that may influence heart rate, more accurate predictions of an athlete's heart rate during endurance activities can be obtained.

Power Over Time

Taking a similar approach to the data above, lets look at power over the length of the ride. When we initially look at power over distance we see what appears to be an positive correlation between power and distance. However, this is not the case. Cycling power output data is better assessed with seasonal decomposition to discover trends because it helps account for the variations in power output that arise from factors such as fatigue, terrain, wind conditions, and rider strategy. Seasonal decomposition separates the data into its various components, including trend, seasonality, and irregular fluctuations. This enables a more accurate analysis of the underlying patterns, as it isolates the factors that contribute to the highs and lows in power output during a race. By decomposing the data, one can better understand the impact of different race conditions, rider strategies, and terrain types on a cyclist's performance and tactics. Current data analyzing application for athletics do not account for correlation between data like we see here. 

Decoupling

To assess how heart rate and power output data decouple over time, the data was first decomposed into its constituent components, which include trend, seasonality, and irregular fluctuations. This decomposition process helps isolate the factors contributing to the variations in heart rate and power output, making it easier to identify the underlying patterns and relationships between these two variables. Following decomposition, the data was then scaled to ensure that both heart rate and power output were expressed in comparable units. This normalization allows for a more accurate assessment of the decoupling between the two variables by accounting for potential differences in the magnitude or range of the data. 

Observing heart rate decoupling from power data right before the 30,000 mark on the distance index can indicate a change in the athlete's physiological state or external factors affecting their performance. This decoupling suggests that the heart rate is increasing while the power output remains stable or even decreases. Several factors could contribute to this phenomenon, including fatigue, dehydration, heat stress, glycogen depletion, or changes in terrain or environmental conditions.  

Such a decoupling can reveal valuable insights about the athlete's cardiovascular efficiency, pacing strategy, and adaptation to training. It may signal that the athlete is approaching their limit or experiencing challenges in maintaining their performance at the current level of exertion. This information can help coaches and athletes make informed decisions about adjusting pacing, nutrition, or hydration strategies during a race or training session, and identify areas for improvement in the athlete's overall fitness and training plan. Monitoring heart rate and power output decoupling can also help detect early signs of overtraining or insufficient recovery, enabling better management of the athlete's long-term health and performance.

Cadence Sweet Spot 

In cycling, cadence refers to the number of pedal revolutions per minute (RPM) and finding the "sweet spot" is key to achieving optimal power and efficiency while riding. While the commonly cited range is between 90-100 RPM, the exact value of the sweet spot can vary among individuals. However, it is worth noting that on the VTXL the cadence between 60 and 80 RPM seemed to be the most common. Straying from this range can result in inconsistent rhythm or increased strain on the knees and legs. Within the sweet spot range, cyclists can strike a good balance between power and efficiency, allowing them to maintain a steady pace and conserve energy. It's important to keep in mind that the sweet spot may differ depending on the rider's physical condition, the terrain, and the gear used. For instance, a higher cadence may be preferable on a steep climb, while a lower cadence may be more efficient on a flat road. 

Due to the nature of this long ride, a lower cadence was a strategic move. In general, a lower cadence may be more sustainable for ultra-long rides as it allows the rider to conserve energy and reduce fatigue. When a rider maintains a lower cadence, they use more muscle fibers, which can reduce the risk of muscular fatigue and cramping. Additionally, a lower cadence may be more efficient on uphill sections, allowing the rider to climb at a steady pace without burning out too quickly.  Lower cadence makes it easier for the cyclist to control power.

Bimodal Power distribution