Job posting analysis

A project used to study the occurance of keywords in a job posting.
Project
R
Shiny App
Author

Mark Edney

Published

February 6, 2022

Recently, there was a post on medium about the use of Natural Language Processing (NLP) to study a job posting for keywords. I found that this article was very similar to R shiny App that I created a while ago. 1

Introduction

Technology has changed the job application process, making it easier and quicker to apply to jobs. As a result, the average job posting will receive around 250 resumes. 2 So how can hiring managers handle spending their time looking through that many resumes for one posting? That’s easy, they cheat.

Hiring Managers no longer look at individual resumes, but use automatic software called applicant tracking system (ATS). These programs filter resumes by a set of keywords, reducing the amount of resumes to a more manageable amount. So how can a job applicant make sure their resume is looked at? Well, they should cheat.

The medium article I mentioned uses Python and Natural Language Processing (NLP) to skim through the job posting to look for the most common words used. This is useful information, but not necessarily the keywords used by the ATS software. I propose the use of an R Shiny App to filter a job posting by a list of common keywords.

An R Shiny App is an interactive web based application that runs R code. The syntax for a Shiny App is a little different from R and requires some additional understanding. The product will be a basic, interactive program that can be hosted online. One free Shiny App hosting site that I recommend is shinyapps.io.

Inialization

The shiny App will require the following libraries.

library(shiny)
library(wordcloud2)
library(tidyverse)
library(XML)
library(rvest)
library(tidytext)

The Shiny App will use a csv files which contains a set of keywords that ATS will look for. This list was found online, but I have modified by adding additional keywords as I see fit. The file can be downloaded here from my GitHub site. Here is a sample of some keywords:

Keywords <- read_csv("Keywords.csv") 
Keywords$Keys %>% head()
[1] ".NET"                "account management"  "accounting"         
[4] "accounts payable"    "accounts receivable" "acquisition"        

App Structure

One issue I found when developing this application was the use of keywords that are a combination of multiple words. This creates some complications, as a simple search of keywords would use only the first word and lose the context.

This challenge was met by breaking the website down into ngrams. An over simplification of a ngram is a group of n number of words. Wikipedia has a very good page that better explains ngrams.3 The website can then be split into ngrams of different lengths and the keywords searched for.

As a example, the phrase:

The quick brown fox

for a ngram of length 1 would return:

(The) (quick) (brown) (fox)

for a ngram of length 2 would return:

(The quick) (quick brown) (brown fox)

and for a ngram of length 3 would return:

(The quick brown) (quick brown fox)

Application Coding

shinyApp(
#This is the standard format for a shiny app
        
#The UI function controls all the frontend for the app
        ui = fluidPage(
                titlePanel("Job Posting Word Cloud"),
                sidebarLayout(
                        sidebarPanel(
#The user is asked for a url
                                textInput("url", "input URL", value = "https://www.google.com/")
                                ),
                        mainPanel(
#The word cloud plot is displayed
                                h4("Key-Word Cloud"),
                                wordcloud2Output("plot")
                                )
                        )
                ),
        
#The server function controls the backend for the app
        server = function(input, output){
                
#The keywords are loaded and an index of how many words per keyword is created
                Keywords <- read_csv("Keywords.csv")
                Keywords$Keys <- str_to_lower(Keywords$Keys) 
                index <- Keywords$Keys %>% str_count(" ")
                
#The { brackets are used to create reactive functions which continuously run 
                data <- reactive({
#The input variable is how the server side receives data from the ui side
                url <- input$url
#The text is read from the url provide by the user
                data <- text %>%
                        data.frame(text = .)
#Since there are ngrams of length 1-3, there a three search's that are concatenated together
                rbind(data %>%
#the unnest_tolkens from the tidytext library is used to create the ngrams
                              unnest_tokens(word, text, token = "ngrams", n= 1) %>%
#A count is performed on each ngram in the website to find the most common ngrams 
                              count(word, name = 'freq', sort = TRUE) %>%
#The ngram count is then filtered by the keywords of the same ngram length
                              filter(word %in% Keywords$Keys[index == 0]),
#The steps are repeated for bigrams (ngrams of length 2) and trigrams(ngrams of length 3)
                      data %>%
                              unnest_tokens(word, text, token = "ngrams", n= 2) %>%
                              count(word, name = 'freq', sort = TRUE) %>%
                              filter(word %in% Keywords$Keys[index == 1]),
                      data %>%
                              unnest_tokens(word, text, token = "ngrams", n= 3) %>%
                              count(word, name = 'freq', sort = TRUE) %>%
                              filter(word %in% Keywords$Keys[index == 2]))
                        })
                
#The plot/wordcloud needs to be saved as an output value
#The output variable is how the server sends data back to the UI
                output$plot <- renderWordcloud2({
#One part of the strange syntax of a shiny app is that the since the data is reactive
#and changes with the user input, it is passed in a function so it needs to be called
#as data ()
                        wordcloud2(data())
                        })
        },

  options = list(height = 500)
)

Shiny App