crawl companies, industries, locations
Check out, review, and merge locally
Step 1. Fetch and check out the branch for this merge request
git fetch origin git checkout -b crawler origin/crawler
Step 2. Review the changes locally
Step 3. Merge the branch and fix any conflicts that come up
git checkout master git merge --no-ff crawler
Step 4. Push the result of the merge to GitLab
git push origin master
Note that pushing to GitLab requires write access to this repository.
Tip: You can also checkout merge requests locally by following these guidelines.
-
lib/tasks/crawler.rake 0 → 100644
1 task crawl_companies_jobs: :environment do 2 require "open-uri" -
Thanh Hung Pham @hungpt commentedMaster
-
Tô Ngọc Ánh @anhtn
changed this line in version 2 of the diff
changed this line in version 2 of the diff
changed this line in [version 2 of the diff](https://gitlab.zigexn.vn/anhtn/VeNJob/merge_requests/2/diffs?diff_id=4847&start_sha=b45a5d2a3b42b81a7c3e3c163a69b17630176f07#b321b772986de9dfe9db0ed4138ae166e577f241_2_1)Toggle commit list
Please register or sign in to reply -
-
lib/tasks/crawler.rake 0 → 100644
1 task crawl_companies_jobs: :environment do 2 require "open-uri" 3 crawl_companies_and_jobs(3) 4 end 5 6 task crawl_industries_locations: :environment do 7 require "open-uri" 8 crawl_industries_and_locations 9 end 10 11 def crawl_companies_and_jobs(page) 12 for i in 1..page -
Thanh Hung Pham @hungpt commentedMaster
-
Tô Ngọc Ánh @anhtn
changed this line in version 2 of the diff
changed this line in version 2 of the diff
changed this line in [version 2 of the diff](https://gitlab.zigexn.vn/anhtn/VeNJob/merge_requests/2/diffs?diff_id=4847&start_sha=b45a5d2a3b42b81a7c3e3c163a69b17630176f07#b321b772986de9dfe9db0ed4138ae166e577f241_12_11)Toggle commit list
-
-
lib/tasks/crawler.rake 0 → 100644
58 job_description = document.css('') 59 rescue => exception 60 61 end 62 end 63 64 def crawl_industries_and_locations 65 document = Nokogiri::HTML(open('https://careerbuilder.vn/viec-lam/tat-ca-viec-lam-vi.html')) 66 industries_xml = document.css('#industry option') 67 industries = industries_xml.map(&:text) 68 locations_xml = document.css('#location option') 69 locations = locations_xml.map(&:text) 70 71 72 industries.each do |industry| 73 exist = Industry.find_by(name: industry).present? -
Thanh Hung Pham @hungpt commentedMasterEdited
@anhtn Có cái method
find_or_create_by. Em thử tìm hiểu dùng xem sao. code nó sẽ ngắn hơn á.@anhtn Có cái method `find_or_create_by`. Em thử tìm hiểu dùng xem sao. code nó sẽ ngắn hơn á. -
Tô Ngọc Ánh @anhtn
changed this line in version 2 of the diff
changed this line in version 2 of the diff
changed this line in [version 2 of the diff](https://gitlab.zigexn.vn/anhtn/VeNJob/merge_requests/2/diffs?diff_id=4847&start_sha=b45a5d2a3b42b81a7c3e3c163a69b17630176f07#b321b772986de9dfe9db0ed4138ae166e577f241_73_93)Toggle commit list
-
-
lib/tasks/crawler.rake 0 → 100644
64 def crawl_industries_and_locations 65 document = Nokogiri::HTML(open('https://careerbuilder.vn/viec-lam/tat-ca-viec-lam-vi.html')) 66 industries_xml = document.css('#industry option') 67 industries = industries_xml.map(&:text) 68 locations_xml = document.css('#location option') 69 locations = locations_xml.map(&:text) 70 71 72 industries.each do |industry| 73 exist = Industry.find_by(name: industry).present? 74 break if exist 75 puts industry 76 Industry.create!(name: industry) 77 end 78 79 locations.take(70).each do |location| -
Thanh Hung Pham @hungpt commentedMaster
-
Tô Ngọc Ánh @anhtn commentedMasterEdited
70 cái đầu là việt nam á anh
70 cái đầu là việt nam á anh -
Thanh Hung Pham @hungpt commentedMasterEdited
@anhtn tạm thời chấp nhận vậy. Em nên đặt constant cho mấy cái như vậy. Sau này có sửa thì sửa 1 chỗ thôi. Nên để trong model của
Location.@anhtn tạm thời chấp nhận vậy. Em nên đặt constant cho mấy cái như vậy. Sau này có sửa thì sửa 1 chỗ thôi. Nên để trong model của `Location`. -
Tô Ngọc Ánh @anhtn
changed this line in version 2 of the diff
changed this line in version 2 of the diff
changed this line in [version 2 of the diff](https://gitlab.zigexn.vn/anhtn/VeNJob/merge_requests/2/diffs?diff_id=4847&start_sha=b45a5d2a3b42b81a7c3e3c163a69b17630176f07#b321b772986de9dfe9db0ed4138ae166e577f241_79_97)Toggle commit list
-
-
lib/tasks/crawler.rake 0 → 100644
24 25 [company_links, job_links] 26 end 27 28 def crawl_companies(company_links) 29 company_links.each do |link| 30 crawl_company(link) 31 end 32 end 33 34 def crawl_company(company_link) 35 begin 36 document = Nokogiri::HTML(open(company_link)) 37 company_name = document.css(".content .name").text 38 exist = Company.find_by(name: company_name).present? 39 return if exist || company_name.empty? -
Thanh Hung Pham @hungpt commentedMasterEdited
@anhtn check
company_name.empty?trước khifind_bynó đỡ phải truy cập database với company_name rỗng.@anhtn check `company_name.empty?` trước khi `find_by` nó đỡ phải truy cập database với company_name rỗng. -
Tô Ngọc Ánh @anhtn
changed this line in version 2 of the diff
changed this line in version 2 of the diff
changed this line in [version 2 of the diff](https://gitlab.zigexn.vn/anhtn/VeNJob/merge_requests/2/diffs?diff_id=4847&start_sha=b45a5d2a3b42b81a7c3e3c163a69b17630176f07#b321b772986de9dfe9db0ed4138ae166e577f241_39_26)Toggle commit list
-
-
Tô Ngọc Ánh @anhtn
resolved all discussions
resolved all discussions
resolved all discussionsToggle commit list -
Tô Ngọc Ánh @anhtn
resolved all discussions
resolved all discussions
resolved all discussionsToggle commit list -
Tô Ngọc Ánh @anhtn
added 2 commits
added 2 commits
added 2 commits * 5907329d - improve code syntax (crawl company, industry, location) * 6496e46e - crawl jobs [Compare with previous version](https://gitlab.zigexn.vn/anhtn/VeNJob/merge_requests/2/diffs?diff_id=4847&start_sha=b45a5d2a3b42b81a7c3e3c163a69b17630176f07)Toggle commit list -
1 1 class Location < ApplicationRecord 2 CITY_VIETNAM_NUMBER = 70 -
Thanh Hung Pham @hungpt commentedMaster
-
Thanh Hung Pham @hungpt commentedMaster
-
-
lib/tasks/crawler.rake 0 → 100644
1 require "open-uri" 2 task crawl_jobs: :environment do 3 job_links = get_job_links(1) 4 crawl_jobs(job_links) 5 end 6 7 task crawl_industries_locations: :environment do 8 crawl_industries_and_locations 9 end 10 11 def get_job_links(page) 12 job_links = [] 13 page.times do |i| 14 document = Nokogiri::HTML(open("https://careerbuilder.vn/viec-lam/tat-ca-viec-lam-trang-#{i}-vi.html")) 15 jobs_xml = document.xpath('//div/a[@class="job_link"]/@href') 16 jobs_xml.each { |i| job_links << i.value} -
Thanh Hung Pham @hungpt commentedMasterEdited
@anhtn Biến
inày trùng vớiiở trên rồi sao em?@anhtn Biến `i` này trùng với `i` ở trên rồi sao em?
-
-
lib/tasks/crawler.rake 0 → 100644
1 require "open-uri" -
Thanh Hung Pham @hungpt commentedMaster
-
-
Tô Ngọc Ánh @anhtn
resolved all discussions
resolved all discussions
resolved all discussionsToggle commit list -
Tô Ngọc Ánh @anhtn
mentioned in commit 9d821f37
mentioned in commit 9d821f37
mentioned in commit 9d821f37cfd9f6f543bff44849bc3f1a3a1e683eToggle commit list -
Tô Ngọc Ánh @anhtn
merged
merged
mergedToggle commit list