Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

MUT

Market Understanding Tool

Python 3.6 Code style: black License: MIT

About

This project is intended to make a pipeline of data analysis about opportunities for data science career announced at Indeed. However, this pipeline can classify job opportunities of whenever sector, beyond data science.

This pipeline generates a .html file with:

  1. Clusters 2D Graph
  2. Clusters Keywords Ranking
  3. TF-IDF Ranking

Check the "Brazillian Data Science Jobs Market: A Deep Analysis" on the web!

Project Details

Folders

Folder Description
db/ Folder where your Scrapy database will be saved
output/ Folder where your graphs and results will be saved

Files

ARGS USAGE
[db-title] It is your Scrapy database title (e. g., datascience_db)
[urls-file] It is your Indeed URL filename (take a look at sample.urls)
[toxicwords-file] It is the filename of list of words for not use in the analysis (take a look at sample.toxicwords)
[num-clusters] Number of clusters to identify, in a range (e. g., 2-8) or single (e. g., 8)

Requirements

Paraphrasing The Beatles: " All you need is docker 🐳 "

Install

1. Clone this repo 🍕
git clone https://github.com/HelioNeves/mut.git
cd /mut
2. Basic building 🔧
docker build . -t mut

Running this awesome docker image

1. Load ubuntu layer 🌈
docker run -ti --name MUT-env mut /bin/bash
2. Once inside ubuntu, run pipeline python scripts 🐍
Scrapy
python3 scraper.py [db-title] [urls-file]
Analytics app
python3 app.py [db-title] [toxicwords-file] [num-clusters]

Releases

No releases published

Packages

No packages published
You can’t perform that action at this time.