Skip to content

  • Projects
  • Groups
  • Snippets
  • Help
  • This project
    • Loading...
  • Sign in / Register
V
VeNJob
  • Overview
    • Overview
    • Details
    • Activity
    • Cycle Analytics
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Charts
  • Issues 0
    • Issues 0
    • List
    • Board
    • Labels
    • Milestones
  • Merge Requests 1
    • Merge Requests 1
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
    • Charts
  • Wiki
    • Wiki
  • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Charts
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
  • Tô Ngọc Ánh
  • VeNJob
  • Merge Requests
  • !2

Merged
Opened Jul 20, 2020 by Tô Ngọc Ánh@anhtn 
  • Report abuse
Report abuse

crawl companies, industries, locations

×

Check out, review, and merge locally

Step 1. Fetch and check out the branch for this merge request

git fetch origin
git checkout -b crawler origin/crawler

Step 2. Review the changes locally

Step 3. Merge the branch and fix any conflicts that come up

git checkout master
git merge --no-ff crawler

Step 4. Push the result of the merge to GitLab

git push origin master

Note that pushing to GitLab requires write access to this repository.

Tip: You can also checkout merge requests locally by following these guidelines.

  • Discussion 11
  • Commits 3
  • Pipelines 2
  • Changes 11
{{ resolvedDiscussionCount }}/{{ discussionCount }} {{ resolvedCountText }} resolved
  • Thanh Hung Pham
    @hungpt started a discussion on an old version of the diff Jul 20, 2020
    Resolved by Tô Ngọc Ánh Jul 21, 2020
    lib/tasks/crawler.rake 0 → 100644
    1 task crawl_companies_jobs: :environment do
    2 require "open-uri"
    • Thanh Hung Pham @hungpt commented Jul 20, 2020
      Master

      @anhtn require 1 lần ở đầu file thôi em.

      Edited Jul 21, 2020
      @anhtn require 1 lần ở đầu file thôi em.
    • Tô Ngọc Ánh @anhtn

      changed this line in version 2 of the diff

      Jul 21, 2020

      changed this line in version 2 of the diff

      changed this line in [version 2 of the diff](https://gitlab.zigexn.vn/anhtn/VeNJob/merge_requests/2/diffs?diff_id=4847&start_sha=b45a5d2a3b42b81a7c3e3c163a69b17630176f07#b321b772986de9dfe9db0ed4138ae166e577f241_2_1)
      Toggle commit list
    Please register or sign in to reply
  • Thanh Hung Pham
    @hungpt started a discussion on an old version of the diff Jul 20, 2020
    Resolved by Tô Ngọc Ánh Jul 21, 2020
    lib/tasks/crawler.rake 0 → 100644
    1 task crawl_companies_jobs: :environment do
    2 require "open-uri"
    3 crawl_companies_and_jobs(3)
    4 end
    5
    6 task crawl_industries_locations: :environment do
    7 require "open-uri"
    8 crawl_industries_and_locations
    9 end
    10
    11 def crawl_companies_and_jobs(page)
    12 for i in 1..page
    • Thanh Hung Pham @hungpt commented Jul 20, 2020
      Master

      @anhtn Hạn chế dùng for nha em

      Edited Jul 21, 2020
      @anhtn Hạn chế dùng `for` nha em
    • Tô Ngọc Ánh @anhtn

      changed this line in version 2 of the diff

      Jul 21, 2020

      changed this line in version 2 of the diff

      changed this line in [version 2 of the diff](https://gitlab.zigexn.vn/anhtn/VeNJob/merge_requests/2/diffs?diff_id=4847&start_sha=b45a5d2a3b42b81a7c3e3c163a69b17630176f07#b321b772986de9dfe9db0ed4138ae166e577f241_12_11)
      Toggle commit list
    Please register or sign in to reply
  • Thanh Hung Pham
    @hungpt started a discussion on an old version of the diff Jul 20, 2020
    Resolved by Tô Ngọc Ánh Jul 21, 2020
    lib/tasks/crawler.rake 0 → 100644
    58 job_description = document.css('')
    59 rescue => exception
    60
    61 end
    62 end
    63
    64 def crawl_industries_and_locations
    65 document = Nokogiri::HTML(open('https://careerbuilder.vn/viec-lam/tat-ca-viec-lam-vi.html'))
    66 industries_xml = document.css('#industry option')
    67 industries = industries_xml.map(&:text)
    68 locations_xml = document.css('#location option')
    69 locations = locations_xml.map(&:text)
    70
    71
    72 industries.each do |industry|
    73 exist = Industry.find_by(name: industry).present?
    • Thanh Hung Pham @hungpt commented Jul 20, 2020
      Master

      @anhtn Có cái method find_or_create_by. Em thử tìm hiểu dùng xem sao. code nó sẽ ngắn hơn á.

      Edited Jul 21, 2020
      @anhtn Có cái method `find_or_create_by`. Em thử tìm hiểu dùng xem sao. code nó sẽ ngắn hơn á.
    • Tô Ngọc Ánh @anhtn

      changed this line in version 2 of the diff

      Jul 21, 2020

      changed this line in version 2 of the diff

      changed this line in [version 2 of the diff](https://gitlab.zigexn.vn/anhtn/VeNJob/merge_requests/2/diffs?diff_id=4847&start_sha=b45a5d2a3b42b81a7c3e3c163a69b17630176f07#b321b772986de9dfe9db0ed4138ae166e577f241_73_93)
      Toggle commit list
    Please register or sign in to reply
  • Thanh Hung Pham
    @hungpt started a discussion on an old version of the diff Jul 20, 2020
    Resolved by Tô Ngọc Ánh Jul 21, 2020
    lib/tasks/crawler.rake 0 → 100644
    64 def crawl_industries_and_locations
    65 document = Nokogiri::HTML(open('https://careerbuilder.vn/viec-lam/tat-ca-viec-lam-vi.html'))
    66 industries_xml = document.css('#industry option')
    67 industries = industries_xml.map(&:text)
    68 locations_xml = document.css('#location option')
    69 locations = locations_xml.map(&:text)
    70
    71
    72 industries.each do |industry|
    73 exist = Industry.find_by(name: industry).present?
    74 break if exist
    75 puts industry
    76 Industry.create!(name: industry)
    77 end
    78
    79 locations.take(70).each do |location|
    • Thanh Hung Pham @hungpt commented Jul 20, 2020
      Master

      @anhtn Chỗ này sao dùng 70 vậy em?

      Edited Jul 21, 2020
      @anhtn Chỗ này sao dùng `70` vậy em?
    • Tô Ngọc Ánh @anhtn commented Jul 20, 2020
      Master

      70 cái đầu là việt nam á anh

      Edited Jul 21, 2020
      70 cái đầu là việt nam á anh
    • Thanh Hung Pham @hungpt commented Jul 20, 2020
      Master

      @anhtn tạm thời chấp nhận vậy. Em nên đặt constant cho mấy cái như vậy. Sau này có sửa thì sửa 1 chỗ thôi. Nên để trong model của Location.

      Edited Jul 21, 2020
      @anhtn tạm thời chấp nhận vậy. Em nên đặt constant cho mấy cái như vậy. Sau này có sửa thì sửa 1 chỗ thôi. Nên để trong model của `Location`.
    • Tô Ngọc Ánh @anhtn

      changed this line in version 2 of the diff

      Jul 21, 2020

      changed this line in version 2 of the diff

      changed this line in [version 2 of the diff](https://gitlab.zigexn.vn/anhtn/VeNJob/merge_requests/2/diffs?diff_id=4847&start_sha=b45a5d2a3b42b81a7c3e3c163a69b17630176f07#b321b772986de9dfe9db0ed4138ae166e577f241_79_97)
      Toggle commit list
    Please register or sign in to reply
  • Thanh Hung Pham
    @hungpt started a discussion on an old version of the diff Jul 20, 2020
    Resolved by Tô Ngọc Ánh Jul 21, 2020
    lib/tasks/crawler.rake 0 → 100644
    24
    25 [company_links, job_links]
    26 end
    27
    28 def crawl_companies(company_links)
    29 company_links.each do |link|
    30 crawl_company(link)
    31 end
    32 end
    33
    34 def crawl_company(company_link)
    35 begin
    36 document = Nokogiri::HTML(open(company_link))
    37 company_name = document.css(".content .name").text
    38 exist = Company.find_by(name: company_name).present?
    39 return if exist || company_name.empty?
    • Thanh Hung Pham @hungpt commented Jul 20, 2020
      Master

      @anhtn check company_name.empty? trước khi find_by nó đỡ phải truy cập database với company_name rỗng.

      Edited Jul 21, 2020
      @anhtn check `company_name.empty?` trước khi `find_by` nó đỡ phải truy cập database với company_name rỗng.
    • Tô Ngọc Ánh @anhtn

      changed this line in version 2 of the diff

      Jul 21, 2020

      changed this line in version 2 of the diff

      changed this line in [version 2 of the diff](https://gitlab.zigexn.vn/anhtn/VeNJob/merge_requests/2/diffs?diff_id=4847&start_sha=b45a5d2a3b42b81a7c3e3c163a69b17630176f07#b321b772986de9dfe9db0ed4138ae166e577f241_39_26)
      Toggle commit list
    Please register or sign in to reply
  • Tô Ngọc Ánh @anhtn

    resolved all discussions

    Jul 21, 2020

    resolved all discussions

    resolved all discussions
    Toggle commit list
  • Tô Ngọc Ánh @anhtn

    resolved all discussions

    Jul 21, 2020

    resolved all discussions

    resolved all discussions
    Toggle commit list
  • Tô Ngọc Ánh @anhtn

    added 2 commits

    • 5907329d - improve code syntax (crawl company, industry, location)
    • 6496e46e - crawl jobs

    Compare with previous version

    Jul 21, 2020

    added 2 commits

    • 5907329d - improve code syntax (crawl company, industry, location)
    • 6496e46e - crawl jobs

    Compare with previous version

    added 2 commits * 5907329d - improve code syntax (crawl company, industry, location) * 6496e46e - crawl jobs [Compare with previous version](https://gitlab.zigexn.vn/anhtn/VeNJob/merge_requests/2/diffs?diff_id=4847&start_sha=b45a5d2a3b42b81a7c3e3c163a69b17630176f07)
    Toggle commit list
  • Thanh Hung Pham
    @hungpt started a discussion on the diff Jul 21, 2020
    Resolved by Tô Ngọc Ánh Jul 22, 2020
    app/models/location.rb
    1 1 class Location < ApplicationRecord
    2 CITY_VIETNAM_NUMBER = 70
    • Thanh Hung Pham @hungpt commented Jul 21, 2020
      Master

      @anhtn Nên dùng .freeze cho constant nha em!

      Edited Jul 22, 2020
      @anhtn Nên dùng `.freeze` cho constant nha em!
    • Thanh Hung Pham @hungpt commented Jul 21, 2020
      Master

      @anhtn Add 1 dòng trong ra chỗ này nha em!

      Edited Jul 22, 2020
      @anhtn Add 1 dòng trong ra chỗ này nha em!
    Please register or sign in to reply
  • Thanh Hung Pham
    @hungpt started a discussion on the diff Jul 21, 2020
    Resolved by Tô Ngọc Ánh Jul 22, 2020
    lib/tasks/crawler.rake 0 → 100644
    1 require "open-uri"
    2 task crawl_jobs: :environment do
    3 job_links = get_job_links(1)
    4 crawl_jobs(job_links)
    5 end
    6
    7 task crawl_industries_locations: :environment do
    8 crawl_industries_and_locations
    9 end
    10
    11 def get_job_links(page)
    12 job_links = []
    13 page.times do |i|
    14 document = Nokogiri::HTML(open("https://careerbuilder.vn/viec-lam/tat-ca-viec-lam-trang-#{i}-vi.html"))
    15 jobs_xml = document.xpath('//div/a[@class="job_link"]/@href')
    16 jobs_xml.each { |i| job_links << i.value}
    • Thanh Hung Pham @hungpt commented Jul 21, 2020
      Master

      @anhtn Biến i này trùng với i ở trên rồi sao em?

      Edited Jul 22, 2020
      @anhtn Biến `i` này trùng với `i` ở trên rồi sao em?
    Please register or sign in to reply
  • Thanh Hung Pham
    @hungpt started a discussion on the diff Jul 21, 2020
    Resolved by Tô Ngọc Ánh Jul 22, 2020
    lib/tasks/crawler.rake 0 → 100644
    1 require "open-uri"
    • Thanh Hung Pham @hungpt commented Jul 21, 2020
      Master

      @anhtn Add new line ở đây nha em!

      Edited Jul 22, 2020
      @anhtn Add new line ở đây nha em!
    Please register or sign in to reply
  • Tô Ngọc Ánh @anhtn

    resolved all discussions

    Jul 22, 2020

    resolved all discussions

    resolved all discussions
    Toggle commit list
  • Tô Ngọc Ánh @anhtn

    mentioned in commit 9d821f37

    Jul 22, 2020

    mentioned in commit 9d821f37

    mentioned in commit 9d821f37cfd9f6f543bff44849bc3f1a3a1e683e
    Toggle commit list
  • Tô Ngọc Ánh @anhtn

    merged

    Jul 22, 2020

    merged

    merged
    Toggle commit list
  • Write
  • Preview
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or sign in to comment
Assignee
No assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
2
2 participants
Reference: anhtn/VeNJob!2
×

Revert this merge request

Switch branch
Cancel
A new branch will be created in your fork and a new merge request will be started.
×

Cherry-pick this merge request

Switch branch
Cancel
A new branch will be created in your fork and a new merge request will be started.