상세 컨텐츠

본문 제목

240328목_TIL

TIL

by 30303 2024. 3. 28. 21:23

본문

728x90

glassdoor 채용공고 크롤링

 

import requests
from bs4 import BeautifulSoup

url = 'https://www.glassdoor.com/Job/index.htm'
headers = {'User-Agent': }
response = requests.get(url,headers=headers)

#응답이 정상(200)이라면
if response.status_code == 200:
    html = response.text
    soup = BeautifulSoup(html, 'html.parser')
    print(soup)

else : 
    print(response.status_code)

 

403,,

https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403

 

403 Forbidden - HTTP | MDN

The HTTP 403 Forbidden response status code indicates that the server understands the request but refuses to authorize it.

developer.mozilla.org

useragent 추가

 

Beautiful Soup으로 안 되어서 Selenium으로 변경.


selenium

 

클릭 전, XPATH ::before 로 복사가 안 된다!

https://www.qtpselenium.com/selenium-training/forum/6879/how-to-click-on-pseudo-elements-before-div-in-selenium

 

How To Click On Pseudo Elements :: before Div In Selenium | Selenium Forum

I am trying to click on one button ( input type) but I am getting errors  please find attached a screenshot  I want to click on Upload Button  I am giving xpath  driver.FindElement(By.Id("UploadFile")); and this is not working  error showing  "Messag

www.qtpselenium.com

윗 단계에서 가져오셈

 

+ 세세한 부분에서 페이지 별로 뻑이 난다면,  

또 위 레벨에서 가져와보기.

 

각 공고별 제목/위치/봉급 XPATH 비교 조금씩 차이가 나서 일단 큰 레벨에서 가져왔지만 더 정교화할 수 있는 방법을

생각해보셈 내일.. 

xbutton
/html/body/div[11]/div[2]/div[2]/div[1]/button


title
//*[@id="jd-job-title-1009207677601"]
//*[@id="jd-job-title-1009207736602"]
//*[@id="jd-job-title-1009207342558"]


job location 
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/header/div[1]/div
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/header/div[1]/div



descrip 
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/div[2]/div[1]/p[1]
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/div[2]/div[1]/div/p[1]


pay 
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/section[1]/div/div[1]/div[1]/div[2]
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/section[1]/div/div[1]/div[1]/div[2]
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/section/div/div[1]/div[1]/div[2]
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/section/div/div[1]/div[1]/div[2]
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/section[1]/div/div[1]/div[1]/div[2]
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/section/div/div[1]/div[1]/div[2]
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/section[1]/div/div[1]/div[1]/div[2]

----
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/section[1]/div/div[1]

//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/section[1]/div/div[1]
---



//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/section[1]/div/div[1]
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/section/div/div[1]
<div class="SalaryEstimate_salaryEstimateContainer__GkgnI"><div class="SalaryEstimate_rangeEstimate__NH604"><div class="ScreenReaderOnly_screenReaderOnly__4hBv1">The minimum salary is $50K and the max salary is $79K.</div><div class="SalaryEstimate_salaryRange__brHFy">$50K – $79K<span class="SalaryEstimate_payPeriod__RsvG_">/yr (Glassdoor est.)</span></div></div><div class="SalaryEstimate_salaryEstimateNumber__SC4__"><div class="SalaryEstimate_medianEstimate__fOYN1">$63K</div><div class="SalaryEstimate_payPeriod__RsvG_">/yr Median</div></div><div class="SalaryEstimate_location__Akels">Jackson, TN</div></div>

sal
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/section[1]/div/div[1]/div[2]/div[1]
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/section/div/div[1]/div[2]/div[1]

sal 기준 
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/section[1]/div/div[1]/div[1]/div[2]/span
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/section[1]/div/div[1]/div[2]/div[2]
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/section/div/div[1]/div[2]/div[2]

size 
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/section[2]/div/div/div[1]/div
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/section[2]/div/div/div[1]/div

founded 
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/section[2]/div/div/div[2]/div
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/section[2]/div/div/div[2]/div

type 
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/section[2]/div/div/div[3]/div
industry 
//*[@id="app-navigation"]/div[3]/div[2]/div[2]/div/div[1]/section/section[2]/div/div/div[4]/div

범주형 변수를 수치화하는 함수 만들기

def job_title(x):
    output= x.split(sep=' ', maxsplit=4)
    if 'Science' in output or 'Scientist' in output or 'Scientist,' in output :
        return 'DS'
    else:
        return  x
def sal_estimate(x):
    output= x.split(sep=' ', maxsplit=4)
    output= output[0]
    output=output.replace('$','')
    output=output.replace('K','')
    output=output.replace('-',' ')
    output=output.replace('(',' ')
    output=output.split(sep=' ', maxsplit=3)

    max=int(output[1])
    min=int(output[0])
    mean=min+(max-min)/2
    
    return mean
def size_encoder(x):
    if x=='1 to 50 employees':
        return 1
    if x=='51 to 200 employees':
        return 2
    if x=='201 to 500 employees':
        return 3
    if x=='501 to 1000 employees':
        return 4
    if x=='1001 to 5000 employees':
        return 5
    if x=='5001 to 10000 employees':
        return 6
    else:
        return random.randint(1,7)

코드의 가독성..how..

'TIL' 카테고리의 다른 글

240401월_TIL  (0) 2024.04.01
240329금_TIL  (0) 2024.04.01
240327수_TIL  (1) 2024.03.27
240326화_TIL  (0) 2024.03.26
240320수_TIL  (0) 2024.03.20

관련글 더보기