Chapter 3 Data transformation

The data will be using for this project is already preprocessed by Bloomberg and Yahoo to eliminate the following conditions.

  • There are no missing or null values in the data.
  • There are no typos, miss-spellings since the data is numeric and has no missing entries.
  • Since date entries are supposed to be provided during data collection, there is no scope for excess data coming into the environment and hence we will not be dealing with cleaning excess data either.
  • We will although, be adding additional columns to the dataset for calculation returns, sharpe ratio, correlation and moving averages.

3.1 Additional Columns for calculating stock performance

  • Returns: The daily return from a stock is defined as the position (hypothetical) you would have at the end of day’s trade had you owned one share of the stock. Formula for Return = (Today’s Return - Yesterday’s Return)
  • Volatility: Volatility is a statistical measure of the dispersion of returns for a given security or market index. Annual volatility is the standard deviation of a stock’s yearly logarithmic returns.
  • Addition of Sectors: We have combined stock data from various companies into their respective sectors. This is an operation done on Python and the data is put in the data folder.