Menu

Python vs SAS: Employee demographics analysis & plots (Part 3)

This is the third part of my exploratory Python Pandas vs SAS data analysis where I present both Python and SAS codes performing the same functions. I provided the justifications for this work in Part I while I performed fundamental summary statistics in Part II using the Group-Apply-Combine feature of Pandas.

In this part III of the series, we shall be performing an employee demographics analysis within the Sales department of Orion Sports Star. More importantly, we shall be using another powerful function in Pandas called pivot-table()The links to the dataset and all codes is maintained in the introductory section.

We shall be focussing on the sales department and asking probing questions like:

  • What is the gender distribution of  employee in the Sales dept?”
  • “Which class of employee within the dept does the company spend more on in terms of salary”,
  • “Which gender is taking or occupying the leadership role and how do their salaries stack up?”, etc

NB: To see the answers, just scroll down to the bottom of the page.

Review: SAS Part II

In part II of this study, we created a work.newsalesemps which is a dataset table in the temporary work library reference. We categorized the dataset based on employee Job_Title. 

In this section, we shall create something called hierarchical table (or dataframe) where the index of the dataset will be based on two or more layers or index. We add the Gender column together with the Job Title column to create hierarchical index structure.

Employee demographics analysis in SAS

Code above transforms our newsalesemps table to a two-index structure where the column contains summary statistics on the salary while the index is based on the Job_title and Gender of the employee.

SAS Output

employee_job_title_gender_distribution

Obviously we can see some interesting breakdown of the Job_Title into different Gender. Okay, let’s hold it here a  bit. Let’s switch to Python and see how to perform same operation.

READ NEXT:  Opportunities in drone applications for Nigeria and other African countries

Employee demographics analysis in PYTHON

in lines 3, we first extract three columns from the table and then in line 4, we grouped the resulting table into hierarchical multi-index structure containing Job_Title and Gender as index. Line 7 unstacked the new table into horizontal form.

Python Output

employee_distribution_python

We see how Python describe() function generated a nice little percentile statistics for us. Also notice that both tables for SAS and Python actually contain same core data besides the additional jazz python included. Now, let’s do some nice little Q&A and present some visualizations.

 

What is the gender distribution of  employee in each job position within the sales department?

We present responses to this question with plot.

SAS Response Code

In line 3 and 4, we pull out table from the temporary work library and set a title for it. Lines 7-11 sets the type of chart we want, sets Salary as response variable on the Y-axis and Job_Title on the X-axis (Notice that we flip or reverse the orientation in line 14). In line 9 we set Gender as the group variable so we have two pillar of bars – one for male, another for female.

READ NEXT:  Email Etiquette for all Startup CEOs

Line 10 sets the stats on the bar chart. We use a cluster bar chart instead of stacked chart. Line 18 sets the position of legend inside the chart and orient the legend description vertically across instead of horizontal line. Rest of the line prints the graph.

Python Code:

demograchics_python

 

Which class of employee within the dept does the company spend more on in terms of salary?

SAS Code:

The code here is similar to the previous one above except for line 10 where we compute Sum instead of mean. We also changed the orientation of the graph but majority of the code remain the same.

READ NEXT:  Understanding Clustering for Machine Learning

company_spending_sas

Python Code:

company_demograph_by_job_title

 

Conclusion:

The bar graph visualization above show something interesting we have not observed before.

  1. There more males within the Sales department than Female.
  2. Orion simply do not have females within the top-level positions. Maybe Orion needs to promote more Female into senior positions or we may need to delve more into Sales department to see if women actually quit before reaching the top heirachy. (tongue in cheek comment)
  3. Salary of the Chief Sales Officer is more than sum of average of all the Sales Rep within the dept.
  4. The company hired spend more money on junior level Sales Rep I that are MALE than any other job class within the department.
  5. From Sales Rep. II upwards, the number of women within the Sales department decline rapidly to the point that we see no female in the top management for both CSO and MMS. Could it be that women lose interest in Sales position over time?

So, that’s it for Part 3 of this Python vs SAS data exploratory discussion. In the next article, we shall explore another data analytic operations.

In the meantime,  let me know which of these codes you find interesting, simpler, clearer and more self-explanatory. Which one do you use? Which one do you prefer? Share your experience with either of these tools in the comment box below.

Cheers!

By @RichardAfolabi

I'm a thinker, teacher, writer, Python enthusiast, Wireless Engineer, Web geek and a solid Chelsea FC Fan. I'm interested in data science, analytics, visualization and data intelligence. Feel free to get in touch.