Bad Data for Optimization Project

Introduction and Motivation

This project seeks to create a framework where in disparate, time-correlated data sources are combined to create an optimizable and not quite random set.

In one of my other projects I have found myself trying to code optimizers from scratch and the existing sources (Machine Learning Mastery) while educationally useful tend to use mathematically simple optimizer data such as:

This function, while usable, tends to make the process of finding the derivatives too easy.

As such, this project is how I personally made data that, while practically useless, presents a more interesting optimization challenge.

Process

I selected two of the datasets from the wonderful Alan Turing institute’s TCPD datasets. In my case I took homeruns in the American Baseball League and the flow of the Nile river at Aswan. Both of these datasets were reduced to range from 1901-1970.

Nile Data

The data for the Nile flow volume looked like:

Baseball Data

The data for the American Baseball League looked like:

Combined 3d plot

The combined data looks like:

This data does exactly what I want it to as it is very messy but still represents an optimizable shape. There is a clear(ish) optimized point at 800 Flow and 1900 Year. In the future I want to make code that can take different kinds of data more easily from the Turing Database. I may add a third set to reduce the wave pattern that occurs after using date as a variable but who knows, big things are coming in the data production sphere…

Check out the Github!

ADHD Treatment