Skill shortages are a drain on society. They hamper economic opportunities for individuals, slow growth for firms, and impede labor productivity in aggregate. Therefore, the ability to understand and predict skill shortages in advance is critical for policy-makers and educators to help alleviate their adverse effects.
In my latest research with Marian-Andrei Rizoiu, Ben Johnston, and Mary-Anne Williams, we implement a high-performing Machine Learning approach to predict occupational skill shortages one-year into the future. For this work, we compile a unique dataset of both Labor Demand and Labor Supply occupational data in Australia from 2012 to 2018. This includes data from 7.7 million job advertisements (ads) from Burning Glass Technologies and 20 official labor force measures. We use these data as explanatory variables and leverage the XGBoost classifier to predict yearly skills shortage classifications for 132 standardized occupations.
Prediction Performance
The models that we construct achieve strong results, achieving up to 83% (F1 Macro Average) for predicting whether an occupation is in shortage or not. We also performed an ablation test, where we separately tested the predictive performance for different feature classes, as seen below (LD=Labor Demand -> job ads, LS=Labor Supply -> employment statistics). Interestingly, we found that job ads data and employment statistics maintain solid performance levels for predicting occupational skill shortages. This is significant because labor demand and labor supply data sources are available across multiple labor markets, whereas longitudinal skill shortages data at the occupational level are rare in most labor markets.
Clearly, skill shortages have strong auto-regressive tendencies (i.e. the best indicator of an occupation being in shortage this year is if it was in shortage last year). However, shortage status changes (when an occupation moves between Not In Shortage and In Shortage) have policy and immigration implications, as governments decide skilled immigration rules based on the needs of the labor market. So, how well can the models we build predict the changes in shortage status? As seen below, performance deteriorates substantially because shortage status changes are rare events. That said, our results show that job ads data and employment statistics were the highest performing feature sets for predicting year-to-year skills shortage changes for occupations.
Again, this is significant because it further highlights the value of near real-time data sources (job ads data) and freely available data sources (employment statistics).
Feature Importance
We then conduct a feature importance analysis on the ‘Labor Demand + Labor Supply’ model in order to draw insights into which of these features are most predictive of skill shortages.
We find that ‘Hours Worked’ is the most important indicator for occupations in shortage. Our interpretation of this result is that when a shortage exists for an occupation, the demands placed upon workers classified in that occupation are naturally high, which manifests in higher work intensity and longer work hours. This is reflected in the figure above where the ‘Hours Worked’ variables are represented in 6 of the top 20 most important features.
With regards to labor demand, years of ‘Education’, years of ‘Experience’, and median ‘Salary’ are all highly important features for predicting occupational skill shortages. This is consistent with prior work, which shows that when an occupation is in shortage, employers adjust job requirements to try and fulfil their demands. With regards to these features, this typically involves lowering the requirements of education and experience and increasing salary levels to attract more candidates.
Quantifying Skill Importance of Occupations In Shortage
Lastly, we put forward a method to analyze the underlying skills of occupations in shortage. This allows us to identify granular details on which skills should be targeted to help alleviate occupational shortages. In a nutshell, we normalize high-occurring skills in job ads and calculate the mean importance score within the occupation. This returns an ordered list of skills by importance and captures emerging skills within an occupation. We used Data Scientists as an example occupation, which has been shown to be in shortage across many labor markets. Below is a visualization of the top 10 Data Science skills in Australia from 2015-2019 using this method.
We hope that the methods and findings from this work can assist policy-makers to better measure and predict skill shortages of occupations. Similarly, educators could apply this work to better identify market demands and adjust their curricula accordingly.
To view the paper, please click here to access the pre-print.