After a totally mind blowing pilot of Miracles DO Happen, the DO Data Science division was at it again on the 9th of July to finish what they started. We hope you did not miss the first session. If you did, we’ll give you a little recap. Last time, we set a goal to predict which gift our office heart-break gal was going to get next from her secret admirer. We had some tips and tricks for doing off the cuff analysis and our tool of the week was Power BI. All of that can be found in our previous episode, The Cupid Discovery. Here’s what happened in our second session…
In The Cupid Discovery, we promised to bring in the psychics to predict the next gift our heart break gal would receive from her secret admirer. We sure did! Well…not exactly. The data scientist who revealed the predictions could have passed for a clairvoyant but of course we discussed the science behind the prediction and demystified her ‘psychic’ powers. A Simple K-means clustering algorithm in the Weka data mining tool was used to create clusters within our gift pool. The algorithm also identified general characteristics of gifts within the clusters formed. Using Power Bi, we applied the results of our data mining activity to the gift pool to make a prediction. Wouldn’t you want to know what our heart break gal is getting next? Some of the gifts were surprising, others were disappointing but there was one gift our heart break gal really wanted and we all hope she gets it!
Tips and Tricks 1
You have probably encountered a cell whose data you wanted to separate into different columns. Today, we are going to show you how to do this. Our approach would be much quicker than copying parts of the data in the cell and pasting them one after the other in separate columns. To do this, we will use Excel’s Text to Columns to split 2 types of data: delimited data and data with a fixed width. Next, we will walk through using Excel’s LEFT, MID and RIGHT substring functions to accomplish a similar task.
Field with a fixed width
Here’s our sample data in a cell:
To separate the field into columns, select the cell to highlight all the data as seen below:
Next, at the top of the Excel work area, go to the Ribbon’s Data tab and select Text to Columns.
This should open the dialog box below. Excel selects the Fixed width radio button by default as it recognizes that elements in the data are aligned and separated by spaces with a fixed width.
Click Next to set column breaks in the data as seen below. The lines with arrow-heads are column breaks (alternatively called break lines) and these can be moved. To create a break line, click at a desired position. Double-click on a break line to delete it or click and drag to move it.
Click Next again to move on to the final step. Click the cell selector next to the Destination field to choose the output cell as seen in Fig. 1. Hit Enter on your keyboard to return to the view.
Click Finish and voila your data is split across columns as seen in Fig 2.
Field with delimiters
The procedure above works for delimited data like the comma separated text below:
Select the cell and click Text to Columns in the Ribbon’s Data tab above the work area. The Delimited radio button is chosen by default as Excel recognizes that elements in the data are separated/delimited by commas.
Click Next and select the Comma check box as seen below to split the data everywhere a comma appears.
Select a destination cell as we did before and click Finish. In the end you should have each comma delimited item in a separate column as seen in Fig 3.
LEFT, MID and RIGHT Substring Functions
What if we wanted to copy just part of the data in a cell to a different cell? Here’s how we would do this for our sample text:
If we wanted to copy the first two characters in the cell above to a different cell, we would use the LEFT(location_of_data, number_of_characters_on_the left) function as seen below:
Similarly, if we wanted to copy the last character in the cell to a different location we would use the RIGHT(location_of_data, number_of_characters on the right) function as seen below:
If the text we want to copy is in the middle of the data we would use the MID(location_of_data, index_of_1st_character_in_sequence, total_number_of_characters_in_sequence) function as seen below:
Tips and Tricks 2
For our next tip, we will show you how to link multiple sheets in an Excel workbook to one master sheet. The great thing about linking our sheets is that changes made in the master sheet would automatically reflect in the other sheets linked to it.
To the left is the data that we will be using. We’ll call it the MASTER sheet since it is the central repository for all the data. Next, we’ll create subsets of this data by course in separate sheets named CS, BA and MIS respectively.
Filter the data in the MASTER sheet by course. To filter the data, highlight all the data and hit CTRL+SHIFT+L (*sigh* doesn’t this take you back to our first session?). When you are done doing this your data should look like the image on the left.
Now we can filter data for each course by clicking the filter drop-down arrow and selecting a course. CS was selected below:
Once you have your filtered data you can simply copy it. Go to your new CS sheet and right-click where you want to paste the data. In the Paste Special Options section, click on Paste Link(circled in red on the left).
You can do this for all the other courses and they will all be linked to the master sheet. There…we’re all done.To test, edit a cell in the MASTER sheet and watch that data change in your linked sheet!
Tool of the week
By now you must know what our tool of the week is. Just in case we got ahead of ourselves with that assumption…it…is…Weka and yes that is the tool we used for our gift prediction! Weka is a data mining tool. It has a host of machine learning algorithms that can be used for the following data mining activities: Data pre-processing, Regression, Classification, Clustering and Data Visualization. The Weka tool is powerful and easy to use and can be downloaded here. We used it for a gift prediction but for our data savvy entrepreneurs, Weka opens up a world of opportunity to mine data on customer behavior and expand your business. Want to know how? Why don’t you give it a whirl?