•IBM is an American MNC operating in around 170 countries with major business vertical as computing, software, and hardware.
•Attrition is a major risk to service-providing organizations where trained and experienced people are the assets of the company.
•The organization would like to identify the factors which influence the attrition of employees.
• Age: Age of employee
• Attrition: Employee attrition status
• Department: Department of work
• DistanceFromHome
• Education: 1-Below College; 2- College; 3-Bachelor; 4-Master; 5-Doctor;
• EducationField
• EnvironmentSatisfaction: 1-Low; 2-Medium; 3-High; 4-Very High;
• JobSatisfaction: 1-Low; 2-Medium; 3-High; 4-Very High;
• MaritalStatus
• MonthlyIncome
• NumCompaniesWorked: Number of companies worked prior to IBM
• WorkLifeBalance: 1-Bad; 2-Good; 3-Better; 4-Best;
• YearsAtCompany: Current years of service in IBM
•Import attrition dataset and import libraries such as pandas, matplotlib.pyplot, numpy, and seaborn.
•Find the age distribution of employees in IBM
Age Distribution of Employees
•Exploring attrition by age
•Explore data for Left employees
Attrition Breakdown
•Find out the distribution of employees by the education field
Education Distribution
•Give a bar chart for the number of married and unmarried employees
Martial Status
•Build up a logistic regression model to predict which employees are likely to attrite.
•Logistic Regression model accuracy is 84% and ROC score is 65%.
Confusion Matrix