AI Titanic Learning

AI vs deep learning vs machine learning: deep learning --> machine learning --> AI

I learned that AI can take many forms, including a simple machine learning or probability based algorithm like the one used in the Titanic Kaggle contest.

Philosophically, the Titantic presents a brutal revelation of society's values. Women and children were favored over men when volunteers were loading lifeboats, perhaps due to simple Darwinian logic (most possibility for the future) or the vestiges of knightly chivalry. However, the preference of young over old is especially interesting. In Western cultures, young people are typically prioritized over the elderly, while filial piety is deeply ingrained in Eastern cultures.

The Titanic Kaggle competition is very interesting and frustrating at the same time! In fact, I definitely need to review basic probability: it's the methods and calculations instead of the coding and syntax that are challenging. 

The basic logic behind my code was to calculate the probabilities of survival for each characteristic included in the data set, such as gender and class. Then, I planned to set up an algorithm that would multiply the percentages together for each passenger, and if >= 50, predict that the passenger had survived. After writing some basic code that incorporated this logic, I started to incorporate real data from the training spreadsheet into my code. 

Data input algorithm:

1. change one column of data on the spreadsheet to a single string, in addition to the 1s and 0s survival column

2. turn the strings into lists with each entry as an item by using the split method

3. set up a for-loop that simulatenously counted the number of a certain characteristic in a column that survived and the total number of a characteristic in the entire passenger population (ex: 40 males survived out of 100 males total)



4. calculate the final percentages by dividing the characteristic-survived by the total characteristic

The code took thought but was ultimately successful; it was the method of calculating probability that became a problem. At first, focusing solely on getting something onscreen, I decided to divide the total number of passengers with a characteristic by the total number of passengers. Then, I multipled each characteristic of a passenger together. However, I soon figured out that this method did not work.

For instance, if there were five female 1st class survivors, four 1st class male survivors, two 2nd class female survivors, and one 2nd class male survivor out of 20 passengers in all, the probability that a female1st class passenger survived was

f-1st class survived / total = 5/20

but according to the first method I tried, it came out to be

(f survived / total) * (1st survived / total) = 7/20 * 9/20 = 63/400

which was lower than the real probability. 

I changed my code to caculate

(f survived / f total) * (1st survived / 1st total)

but this method did not produce the right results either. 

Still working on the probability part of this coding! 

Comments