Step 3 – Data Manipulation, Data Cleaning, and Data Preparation (20 points)
Dataset: world_development_indicators.csv

1. Assume that you are going to use a modeling technique in the future that does not tolerate missing values. Thus you need to deal with missing values. You may (should!) choose a different strategy to deal with missing values for each variable according to your data and business understanding. Use the criteria you have learned to decide what to do with missing values. Clearly and in detail, explain what you will be going to do to deal with existing missing values in your dataset.
2. Implement the plan/strategy you developed above.
3. Check for strong correlations among your variables. Did you detect any super strong correlations (above absolute value of 0.9) in the previous step? What are you going to do about them? Which variable of each pair are you going to keep and why?
4. Implement the plan/strategy you develpped for dealing with highly correlated variables you identified in the previous question and save results as a new csv file with the following name: “world_dev_ind_clean.csv”
a. You will need to submit this dataset along with your final project report
5. Using the provided country_metadata.csv in combination with the datset you created as result of the question 4 of step 3 (world_dev_ind_clean.csv), create a new dataset that contains the real country names (short names), their income group, and their region variables instead of the country code variable.
a. Hint: joining/merging, select attributes, export data, and store data. You only need the country names (short), income group, and region variables from the
country_metadata.csv
6. Reorder the new dataset, so the country name, income group, and region are the first four columns in your new dataset. Save your dataset with this new variable order. Save the new resulting dataset as “WorldIndicatorsComplete.csv”
You will need to submit this dataset along with your final project report

For This or a Similar Paper Click Here To Order Now