ITEC 320
Assignment 2
Please submit your assignment via Blackboard, as a single Word file.
The first two questions use the Deposits Excel file. You will need to import it into RapidMiner. It consists of all individual deposits made at a regional bank in a single day. There are 3510 deposits, and four attributes (columns) in the dataset: the deposit amount; whether the customer was depositing cash, checks, or both; the branch number; and whether the transaction was handled by an ATM or a teller.
As the analyst working on the dataset, you have determined that Branch # is irrelevant. You have also noticed that there are several “-1” values for the Amount ($) variable, which indicate an error in processing the deposit. You plan to focus primarily on cash deposits.
1. Build a process in RapidMiner that does the following:
-Selects the Amount ($), Type, and Method attributes (but not Branch #)
-Removes all rows from the data set with Amount ($) = -1
-Keeps only rows with Type = “Cash”
Show a screenshot of the Process panel. (You do not need to include the Parameters panel.)
2. Run your process from the previous question. Show a screenshot of the Statistics output in the Results view, with Amount ($) expanded (that is, with the histogram and deviation visible for the Amount ($) attribute).
The next three questions use the “Labor-Negotiations” dataset that comes with RapidMiner. It is located in the Repository panel, in Samples -> data.
3. Build a process that uses the Select Attributes and Filter Examples operators to obtain a dataset that includes only the duration, wage-inc-1st, and working-hours attributes, and only includes examples where the value for working-hours is at least 36, and the value for duration is not missing. Show a screenshot of this smaller dataset in the Results view. Your screenshot does not need to show all of the rows in the Results view, but must include at least the first 10.
4. Of the workers in this smaller data set, what is the mean of wage-inc-1st?
5. Use the Correlation Matrix operator to create a correlation matrix of this smaller data set, and show a screenshot of the matrix. Of the three attributes, are there any pairs that appear to be correlated? If so, which one(s)?