JobSpy is a job scraping library with the goal of aggregating all the jobs from popular job boards with one tool.
Features
- Scrapes job postings from LinkedIn, Indeed, Glassdoor, Google, ZipRecruiter, & other job boards concurrently
- Aggregates the job postings in a dataframe
- Proxies support to bypass blocking
Installation
pip install -U python-jobspy
Python version >= 3.10 required
Usage
from jobspy import scrape_jobs
jobs = scrape_jobs(
site_name=["indeed", "linkedin", "zip_recruiter", "google"], # "glassdoor", "bayt", "naukri", "bdjobs"
search_term="software engineer",
google_search_term="software engineer jobs near San Francisco, CA since yesterday",
location="San Francisco, CA",
results_wanted=20,
hours_old=72,
country_indeed='USA',
# linkedin_fetch_description=True # gets more info such as description, direct job url (slower)
# proxies=["208.195.175.46:65095", "208.195.175.45:65095", "localhost"],
)
print(f"Found {len(jobs)} jobs")
print(jobs.head())
jobs.to_csv("jobs.csv", quoting=csv.QUOTE_NONNUMERIC, escapechar="\\", index=False) # to_excel
Output
SITE TITLE COMPANY CITY STATE JOB_TYPE INTERVAL MIN_AMOUNT MAX_AMOUNT JOB_URL DESCRIPTION
indeed Software Engineer AMERICAN SYSTEMS Arlington VA None yearly 200000 150000 https://www.indeed.com/viewjob?jk=5e409e577046... THIS POSITION COMES WITH A 10K SIGNING BONUS!...
indeed Senior Software Engineer TherapyNotes.com Philadelphia PA fulltime yearly 135000 110000 https://www.indeed.com/viewjob?jk=da39574a40cb... About Us TherapyNotes is the national leader i...
linkedin Software Engineer - Early Career Lockheed Martin Sunnyvale CA fulltime yearly None None https://www.linkedin.com/jobs/view/3693012711 Description:By bringing together people that u...
linkedin Full-Stack Software Engineer Rain New York NY fulltime yearly None None https://www.linkedin.com/jobs/view/3696158877 Rain's mission is to create the fastest and ea...
zip_recruiter Software Engineer - New Grad ZipRecruiter Santa Monica CA fulltime yearly 130000 150000 https://www.ziprecruiter.com/jobs/ziprecruiter... We offer a hybrid work environment. Most US-ba...
zip_recruiter Software Developer TEKsystems Phoenix AZ fulltime hourly 65 75 https://www.ziprecruiter.com/jobs/teksystems-0... Top Skills' Details* 6 years of Java developme...
Parameters for scrape_jobs()
Optional
+-- site_name (list|str):
| linkedin, zip_recruiter, indeed, glassdoor, google, bayt, bdjobs
| (default is all)
|
+-- search_term (str)
|
+-- google_search_term (str)
| search term for google jobs. This is the only param for filtering google jobs.
|
+-- location (str)
|
+-- distance (int):
| in miles, default 50
|
+-- job_type (str):
| fulltime, parttime, internship, contract
|
+-- proxies (list):
| in format ['user:pass@host:port', 'localhost']
| each job board scraper will round robin through the proxies
|
+-- is_remote (bool)
|
+-- results_wanted (int):
| number of job results to retrieve for each site specified in 'site_name'
|
+-- easy_apply (bool):
| filters for jobs that are hosted on the job board site (LinkedIn easy apply filter no longer works)
|
+-- user_agent (str):
| override the default user agent which may be outdated
|
+-- description_format (str):
| markdown, html (Format type of the job descriptions. Default is markdown.)
|
+-- offset (int):
| starts the search from an offset (e.g. 25 will start the search from the 25th result)
|
+-- hours_old (int):
| filters jobs by the number of hours since the job was posted
| (ZipRecruiter and Glassdoor round up to next day.)
|
+-- verbose (int) {0, 1, 2}:
| Controls the verbosity of the runtime printouts
| (0 prints only errors, 1 is errors+warnings, 2 is all logs. Default is 2.)
+-- linkedin_fetch_description (bool):
| fetches full description and direct job url for LinkedIn (Increases requests by O(n))
|
+-- linkedin_company_ids (list[int]):
| searches for linkedin jobs with specific company ids
|
+-- country_indeed (str):
| filters the country on Indeed & Glassdoor (see below for correct spelling)
|
+-- enforce_annual_salary (bool):
| converts wages to annual salary
|
+-- ca_cert (str)
| path to CA Certificate file for proxies