glassdoor 채용공고 크롤링
import requests
from bs4 import BeautifulSoup
url = 'https://www.glassdoor.com/Job/index.htm'
headers = {'User-Agent': }
response = requests.get(url,headers=headers)
#응답이 정상(200)이라면
if response.status_code == 200:
html = response.text
soup = BeautifulSoup(html, 'html.parser')
print(soup)
else :
print(response.status_code)
403,,
https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403
403 Forbidden - HTTP | MDN
The HTTP 403 Forbidden response status code indicates that the server understands the request but refuses to authorize it.
developer.mozilla.org
useragent 추가
Beautiful Soup으로 안 되어서 Selenium으로 변경.
selenium
클릭 전, XPATH ::before 로 복사가 안 된다!
How To Click On Pseudo Elements :: before Div In Selenium | Selenium Forum
I am trying to click on one button ( input type) but I am getting errors please find attached a screenshot I want to click on Upload Button I am giving xpath driver.FindElement(By.Id("UploadFile")); and this is not working error showing "Messag
www.qtpselenium.com
윗 단계에서 가져오셈
+ 세세한 부분에서 페이지 별로 뻑이 난다면,
또 위 레벨에서 가져와보기.
각 공고별 제목/위치/봉급 XPATH 비교 조금씩 차이가 나서 일단 큰 레벨에서 가져왔지만 더 정교화할 수 있는 방법을
생각해보셈 내일..
xbutton
/html/body/div[11]/div[2]/div[2]/div[1]/button
title
//*[@id="jd-job-title-1009207677601"]
//*[@id="jd-job-title-1009207736602"]
//*[@id="jd-job-title-1009207342558"]
job location
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/header/div[1]/div
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/header/div[1]/div
descrip
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/div[2]/div[1]/p[1]
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/div[2]/div[1]/div/p[1]
pay
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/section[1]/div/div[1]/div[1]/div[2]
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/section[1]/div/div[1]/div[1]/div[2]
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/section/div/div[1]/div[1]/div[2]
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/section/div/div[1]/div[1]/div[2]
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/section[1]/div/div[1]/div[1]/div[2]
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/section/div/div[1]/div[1]/div[2]
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/section[1]/div/div[1]/div[1]/div[2]
----
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/section[1]/div/div[1]
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/section[1]/div/div[1]
---
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/section[1]/div/div[1]
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/section/div/div[1]
<div class="SalaryEstimate_salaryEstimateContainer__GkgnI"><div class="SalaryEstimate_rangeEstimate__NH604"><div class="ScreenReaderOnly_screenReaderOnly__4hBv1">The minimum salary is $50K and the max salary is $79K.</div><div class="SalaryEstimate_salaryRange__brHFy">$50K – $79K<span class="SalaryEstimate_payPeriod__RsvG_">/yr (Glassdoor est.)</span></div></div><div class="SalaryEstimate_salaryEstimateNumber__SC4__"><div class="SalaryEstimate_medianEstimate__fOYN1">$63K</div><div class="SalaryEstimate_payPeriod__RsvG_">/yr Median</div></div><div class="SalaryEstimate_location__Akels">Jackson, TN</div></div>
sal
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/section[1]/div/div[1]/div[2]/div[1]
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/section/div/div[1]/div[2]/div[1]
sal 기준
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/section[1]/div/div[1]/div[1]/div[2]/span
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/section[1]/div/div[1]/div[2]/div[2]
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/section/div/div[1]/div[2]/div[2]
size
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/section[2]/div/div/div[1]/div
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/section[2]/div/div/div[1]/div
founded
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/section[2]/div/div/div[2]/div
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/section[2]/div/div/div[2]/div
type
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/section[2]/div/div/div[3]/div
industry
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/section[2]/div/div/div[4]/div
범주형 변수를 수치화하는 함수 만들기
def job_title(x):
output= x.split(sep=' ', maxsplit=4)
if 'Science' in output or 'Scientist' in output or 'Scientist,' in output :
return 'DS'
else:
return x
def sal_estimate(x):
output= x.split(sep=' ', maxsplit=4)
output= output[0]
output=output.replace('$','')
output=output.replace('K','')
output=output.replace('-',' ')
output=output.replace('(',' ')
output=output.split(sep=' ', maxsplit=3)
max=int(output[1])
min=int(output[0])
mean=min+(max-min)/2
return mean
def size_encoder(x):
if x=='1 to 50 employees':
return 1
if x=='51 to 200 employees':
return 2
if x=='201 to 500 employees':
return 3
if x=='501 to 1000 employees':
return 4
if x=='1001 to 5000 employees':
return 5
if x=='5001 to 10000 employees':
return 6
else:
return random.randint(1,7)
코드의 가독성..how..
240401월_TIL (0) | 2024.04.01 |
---|---|
240329금_TIL (0) | 2024.04.01 |
240327수_TIL (1) | 2024.03.27 |
240326화_TIL (0) | 2024.03.26 |
240320수_TIL (0) | 2024.03.20 |