Skip to content

  • Projects
  • Groups
  • Snippets
  • Help
  • This project
    • Loading...
  • Sign in / Register
V
VenJob
  • Overview
    • Overview
    • Details
    • Activity
    • Cycle Analytics
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Charts
  • Issues 0
    • Issues 0
    • List
    • Board
    • Labels
    • Milestones
  • Merge Requests 0
    • Merge Requests 0
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
    • Charts
  • Wiki
    • Wiki
  • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Charts
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
  • Mai Hoang Thai Ha
  • VenJob
  • Merge Requests
  • !4

Merged
Opened Jul 08, 2021 by Mai Hoang Thai Ha@hamht 
  • Report abuse
Report abuse

created sample rake task for crawler

From Task/6_create_crawler into master

×

Check out, review, and merge locally

Step 1. Fetch and check out the branch for this merge request

git fetch origin
git checkout -b Task/6_create_crawler origin/Task/6_create_crawler

Step 2. Review the changes locally

Step 3. Merge the branch and fix any conflicts that come up

git checkout master
git merge --no-ff Task/6_create_crawler

Step 4. Push the result of the merge to GitLab

git push origin master

Note that pushing to GitLab requires write access to this repository.

Tip: You can also checkout merge requests locally by following these guidelines.

  • Discussion 33
  • Commits 21
  • Changes 6
{{ resolvedDiscussionCount }}/{{ discussionCount }} {{ resolvedCountText }} resolved
  • Mai Hoang Thai Ha @hamht

    added 1 commit

    • 31274088 - Fixed code style

    Compare with previous version

    Jul 08, 2021

    added 1 commit

    • 31274088 - Fixed code style

    Compare with previous version

    added 1 commit * 31274088 - Fixed code style [Compare with previous version](https://gitlab.zigexn.vn/hamht/VenJob/merge_requests/4/diffs?diff_id=5311&start_sha=f54f83659141e514aa5ab5782046dbb4bfe44309)
    Toggle commit list
  • Mai Hoang Thai Ha @hamht

    added 1 commit

    • e16f8904 - fix description, requirement from array to string, crawler first page on CareerBuilder

    Compare with previous version

    Jul 12, 2021

    added 1 commit

    • e16f8904 - fix description, requirement from array to string, crawler first page on CareerBuilder

    Compare with previous version

    added 1 commit * e16f8904 - fix description, requirement from array to string, crawler first page on CareerBuilder [Compare with previous version](https://gitlab.zigexn.vn/hamht/VenJob/merge_requests/4/diffs?diff_id=5317&start_sha=3127408838923a724963dc5a8639c1748fa4b5ca)
    Toggle commit list
  • Mai Hoang Thai Ha @hamht

    added 1 commit

    • 423678ed - add rubocop gem, created industry, city crawler task

    Compare with previous version

    Jul 12, 2021

    added 1 commit

    • 423678ed - add rubocop gem, created industry, city crawler task

    Compare with previous version

    added 1 commit * 423678ed - add rubocop gem, created industry, city crawler task [Compare with previous version](https://gitlab.zigexn.vn/hamht/VenJob/merge_requests/4/diffs?diff_id=5318&start_sha=e16f89047f65184892f1d1e6cdf797d78c30057c)
    Toggle commit list
  • phuctmZigexn
    @phuctm started a discussion on an old version of the diff Jul 13, 2021
    Last updated by Mai Hoang Thai Ha Jul 13, 2021
    lib/tasks/web_crawler.rake 0 → 100644
    1 require 'open-uri'
    2 require 'csv'
    3 require 'zip'
    4
    5 namespace :job do
    • phuctmZigexn @phuctm commented Jul 13, 2021
      Master

      namespace thường chỉ vào những cái nào cụ thể, như là 1 đối tượng hoặc là 1 cái gì đó chung cho các task. Em tham khảo ở đây sẽ thấy rails đặt tên namespace cho rake task nè

      https://guides.rubyonrails.org/command_line.html

      ví dụ:

      rails db:migrate
      rails db:migrate:down
      rails db:migrate:redo
      rails db:migrate:status
      rails db:migrate:up

      nên chỗ này em hãy đổi lại thành namespace :crawler

      namespace thường chỉ vào những cái nào cụ thể, như là 1 đối tượng hoặc là 1 cái gì đó chung cho các task. Em tham khảo ở đây sẽ thấy rails đặt tên namespace cho rake task nè https://guides.rubyonrails.org/command_line.html ví dụ: ``` rails db:migrate rails db:migrate:down rails db:migrate:redo rails db:migrate:status rails db:migrate:up ``` nên chỗ này em hãy đổi lại thành `namespace :crawler`
    • Mai Hoang Thai Ha @hamht

      changed this line in version 5 of the diff

      Jul 13, 2021

      changed this line in version 5 of the diff

      changed this line in [version 5 of the diff](https://gitlab.zigexn.vn/hamht/VenJob/merge_requests/4/diffs?diff_id=5319&start_sha=423678ed7814c78f90c3721d00ec851c60bbfc58#35317b89b9f7aa1703a6c40996f41a7210416841_5_5)
      Toggle commit list
    Please register or sign in to reply
  • phuctmZigexn
    @phuctm started a discussion on an old version of the diff Jul 13, 2021
    Last updated by Mai Hoang Thai Ha Jul 13, 2021
    lib/tasks/web_crawler.rake 0 → 100644
    1 require 'open-uri'
    2 require 'csv'
    3 require 'zip'
    4
    5 namespace :job do
    6 desc 'importjob'
    7
    8 task web_job_crawler: :environment do
    • phuctmZigexn @phuctm commented Jul 13, 2021
      Master

      tương tự comment ở trên thì chỗ này em đổi lại thành task jobs: :environment và tương tự với industries, cities bên dưới thì em sẽ có commands là

      rails crawler:jobs
      rails crawler:industries
      rails crawler:cities
      tương tự comment ở trên thì chỗ này em đổi lại thành `task jobs: :environment` và tương tự với industries, cities bên dưới thì em sẽ có commands là ``` rails crawler:jobs rails crawler:industries rails crawler:cities ```
    • Mai Hoang Thai Ha @hamht

      changed this line in version 5 of the diff

      Jul 13, 2021

      changed this line in version 5 of the diff

      changed this line in [version 5 of the diff](https://gitlab.zigexn.vn/hamht/VenJob/merge_requests/4/diffs?diff_id=5319&start_sha=423678ed7814c78f90c3721d00ec851c60bbfc58#35317b89b9f7aa1703a6c40996f41a7210416841_8_8)
      Toggle commit list
    Please register or sign in to reply
  • phuctmZigexn
    @phuctm started a discussion on an old version of the diff Jul 13, 2021
    Last updated by Mai Hoang Thai Ha Jul 13, 2021
    lib/tasks/web_crawler.rake 0 → 100644
    1 require 'open-uri'
    2 require 'csv'
    3 require 'zip'
    4
    5 namespace :job do
    6 desc 'importjob'
    7
    8 task web_job_crawler: :environment do
    9 parsed_page ||= Nokogiri::HTML(HTTParty.get('https://careerbuilder.vn/viec-lam/tat-ca-viec-lam-vi.html').body)
    • phuctmZigexn @phuctm commented Jul 13, 2021
      Master

      chỗ này em đâu cần phải dùng ||=. Em tìm lại định nghĩa của operator này nha, tại sao và khi nào dùng nó.

      chỗ này em chỉ cần = là được

      chỗ này em đâu cần phải dùng `||=`. Em tìm lại định nghĩa của operator này nha, tại sao và khi nào dùng nó. chỗ này em chỉ cần `=` là được
    • Mai Hoang Thai Ha @hamht

      changed this line in version 5 of the diff

      Jul 13, 2021

      changed this line in version 5 of the diff

      changed this line in [version 5 of the diff](https://gitlab.zigexn.vn/hamht/VenJob/merge_requests/4/diffs?diff_id=5319&start_sha=423678ed7814c78f90c3721d00ec851c60bbfc58#35317b89b9f7aa1703a6c40996f41a7210416841_9_8)
      Toggle commit list
    Please register or sign in to reply
  • phuctmZigexn
    @phuctm started a discussion on an old version of the diff Jul 13, 2021
    Last updated by Mai Hoang Thai Ha Jul 13, 2021
    lib/tasks/web_crawler.rake 0 → 100644
    1 require 'open-uri'
    2 require 'csv'
    3 require 'zip'
    4
    5 namespace :job do
    6 desc 'importjob'
    7
    8 task web_job_crawler: :environment do
    9 parsed_page ||= Nokogiri::HTML(HTTParty.get('https://careerbuilder.vn/viec-lam/tat-ca-viec-lam-vi.html').body)
    10 job_item = parsed_page.css('div.job-item')
    • phuctmZigexn @phuctm commented Jul 13, 2021
      Master

      job_item là 1 array có nhiều phần tử thì em đặt với số nhiều nha => jobs

      `job_item` là 1 array có nhiều phần tử thì em đặt với số nhiều nha => `jobs`
    • phuctmZigexn @phuctm commented Jul 13, 2021
      Master

      job_item = parsed_page.css('div.job-item')

      chỗ này em đang muốn lấy ra các jobs thì thay vì em lấy cái div.job-item rồi em lại phải đi lấy từng link để đi vào trang job detail thì em nên lấy trực tiếp luôn cái job_link như vậy phải nhanh và tiện hơn không?

      => parsed_page.css('.job-item .job_link')

      Edited Jul 13, 2021 by phuctmZigexn
      `job_item = parsed_page.css('div.job-item')` chỗ này em đang muốn lấy ra các jobs thì thay vì em lấy cái `div.job-item` rồi em lại phải đi lấy từng link để đi vào trang job detail thì em nên lấy trực tiếp luôn cái `job_link` như vậy phải nhanh và tiện hơn không? => `parsed_page.css('.job-item .job_link')`
    • Mai Hoang Thai Ha @hamht

      changed this line in version 5 of the diff

      Jul 13, 2021

      changed this line in version 5 of the diff

      changed this line in [version 5 of the diff](https://gitlab.zigexn.vn/hamht/VenJob/merge_requests/4/diffs?diff_id=5319&start_sha=423678ed7814c78f90c3721d00ec851c60bbfc58#35317b89b9f7aa1703a6c40996f41a7210416841_10_8)
      Toggle commit list
    Please register or sign in to reply
  • phuctmZigexn
    @phuctm started a discussion on an old version of the diff Jul 13, 2021
    Last updated by Mai Hoang Thai Ha Jul 13, 2021
    lib/tasks/web_crawler.rake 0 → 100644
    1 require 'open-uri'
    2 require 'csv'
    3 require 'zip'
    4
    5 namespace :job do
    6 desc 'importjob'
    7
    8 task web_job_crawler: :environment do
    9 parsed_page ||= Nokogiri::HTML(HTTParty.get('https://careerbuilder.vn/viec-lam/tat-ca-viec-lam-vi.html').body)
    10 job_item = parsed_page.css('div.job-item')
    11 (0..job_item.count - 1).each do |item|
    • phuctmZigexn @phuctm commented Jul 13, 2021
      Master

      đối với ruby job_item.count thì em hãy dùng length

      ref: https://github.com/JuanitoFatas/fast-ruby#arraylength-vs-arraysize-vs-arraycount-code

      đối với **ruby** `job_item.count` thì em hãy dùng `length` ref: https://github.com/JuanitoFatas/fast-ruby#arraylength-vs-arraysize-vs-arraycount-code
    • Mai Hoang Thai Ha @hamht

      changed this line in version 5 of the diff

      Jul 13, 2021

      changed this line in version 5 of the diff

      changed this line in [version 5 of the diff](https://gitlab.zigexn.vn/hamht/VenJob/merge_requests/4/diffs?diff_id=5319&start_sha=423678ed7814c78f90c3721d00ec851c60bbfc58#35317b89b9f7aa1703a6c40996f41a7210416841_11_8)
      Toggle commit list
    Please register or sign in to reply
  • phuctmZigexn
    @phuctm started a discussion on an old version of the diff Jul 13, 2021
    Last updated by Mai Hoang Thai Ha Jul 13, 2021
    lib/tasks/web_crawler.rake 0 → 100644
    1 require 'open-uri'
    2 require 'csv'
    3 require 'zip'
    4
    5 namespace :job do
    6 desc 'importjob'
    7
    8 task web_job_crawler: :environment do
    9 parsed_page ||= Nokogiri::HTML(HTTParty.get('https://careerbuilder.vn/viec-lam/tat-ca-viec-lam-vi.html').body)
    10 job_item = parsed_page.css('div.job-item')
    11 (0..job_item.count - 1).each do |item|
    12 job_link = job_item[item].css('div.title a').attribute('href').text
    13 unparsed_job_link = HTTParty.get(job_link)
    14 parsed_job_link ||= Nokogiri::HTML(unparsed_job_link.body)
    • phuctmZigexn @phuctm commented Jul 13, 2021
      Master
      unparsed_job_link = HTTParty.get(job_link)
      parsed_job_link ||= Nokogiri::HTML(unparsed_job_link.body)

      2 dòng này em gộp lại được như cách em làm ở trên mà đặt tên biến là job_page nhé hoặc tên nào em thấy hợp lí

      ```ruby unparsed_job_link = HTTParty.get(job_link) parsed_job_link ||= Nokogiri::HTML(unparsed_job_link.body) ``` 2 dòng này em gộp lại được như cách em làm ở trên mà đặt tên biến là `job_page` nhé hoặc tên nào em thấy hợp lí
    • Mai Hoang Thai Ha @hamht

      changed this line in version 5 of the diff

      Jul 13, 2021

      changed this line in version 5 of the diff

      changed this line in [version 5 of the diff](https://gitlab.zigexn.vn/hamht/VenJob/merge_requests/4/diffs?diff_id=5319&start_sha=423678ed7814c78f90c3721d00ec851c60bbfc58#35317b89b9f7aa1703a6c40996f41a7210416841_14_8)
      Toggle commit list
    Please register or sign in to reply
  • Mai Hoang Thai Ha @hamht

    added 1 commit

    • ea11d8e6 - rename namespace, use length instead of count,...

    Compare with previous version

    Jul 13, 2021

    added 1 commit

    • ea11d8e6 - rename namespace, use length instead of count,...

    Compare with previous version

    added 1 commit * ea11d8e6 - rename namespace, use length instead of count,... [Compare with previous version](https://gitlab.zigexn.vn/hamht/VenJob/merge_requests/4/diffs?diff_id=5319&start_sha=423678ed7814c78f90c3721d00ec851c60bbfc58)
    Toggle commit list
  • phuctmZigexn
    @phuctm started a discussion on an old version of the diff Jul 13, 2021
    Last updated by Mai Hoang Thai Ha Jul 13, 2021
    lib/tasks/web_crawler.rake 0 → 100644
    1 require 'open-uri'
    2 require 'csv'
    3 require 'zip'
    4
    5 namespace :crawler do
    6 desc 'importjob'
    7
    8 task jobs: :environment do
    9 parsed_page = Nokogiri::HTML(HTTParty.get('https://careerbuilder.vn/viec-lam/tat-ca-viec-lam-vi.html').body)
    10 jobs_item = parsed_page.css('div.job-item .job_link')
    • phuctmZigexn @phuctm commented Jul 13, 2021
      Master
      jobs_item = parsed_page.css('div.job-item .job_link')
      (0..jobs_item.length - 1).each do |item|

      giữa 2 chỗ này mình enter break line 1 dòng đi em, mỗi 1 phần mình break line ra cho dễ nhìn

      ```ruby jobs_item = parsed_page.css('div.job-item .job_link') (0..jobs_item.length - 1).each do |item| ``` giữa 2 chỗ này mình enter break line 1 dòng đi em, mỗi 1 phần mình break line ra cho dễ nhìn
    • Mai Hoang Thai Ha @hamht

      changed this line in version 6 of the diff

      Jul 13, 2021

      changed this line in version 6 of the diff

      changed this line in [version 6 of the diff](https://gitlab.zigexn.vn/hamht/VenJob/merge_requests/4/diffs?diff_id=5320&start_sha=ea11d8e64eaedefd6891cc4ab2835b8664598418#35317b89b9f7aa1703a6c40996f41a7210416841_10_9)
      Toggle commit list
    Please register or sign in to reply
  • phuctmZigexn
    @phuctm started a discussion on an old version of the diff Jul 13, 2021
    Last updated by Mai Hoang Thai Ha Jul 13, 2021
    lib/tasks/web_crawler.rake 0 → 100644
    1 require 'open-uri'
    2 require 'csv'
    3 require 'zip'
    4
    5 namespace :crawler do
    6 desc 'importjob'
    7
    8 task jobs: :environment do
    9 parsed_page = Nokogiri::HTML(HTTParty.get('https://careerbuilder.vn/viec-lam/tat-ca-viec-lam-vi.html').body)
    10 jobs_item = parsed_page.css('div.job-item .job_link')
    11 (0..jobs_item.length - 1).each do |item|
    • phuctmZigexn @phuctm commented Jul 13, 2021
      Master

      (0..jobs_item.length - 1).each do |item|

      chỗ này em để là jobs_item.each do |item| được mà ??? Xem lại cách duyệt array trong ruby nha

      `(0..jobs_item.length - 1).each do |item|` chỗ này em để là `jobs_item.each do |item|` được mà ??? Xem lại cách duyệt array trong ruby nha
    • Mai Hoang Thai Ha @hamht

      changed this line in version 6 of the diff

      Jul 13, 2021

      changed this line in version 6 of the diff

      changed this line in [version 6 of the diff](https://gitlab.zigexn.vn/hamht/VenJob/merge_requests/4/diffs?diff_id=5320&start_sha=ea11d8e64eaedefd6891cc4ab2835b8664598418#35317b89b9f7aa1703a6c40996f41a7210416841_11_9)
      Toggle commit list
    Please register or sign in to reply
  • phuctmZigexn
    @phuctm started a discussion on an old version of the diff Jul 13, 2021
    Last updated by Mai Hoang Thai Ha Jul 13, 2021
    lib/tasks/web_crawler.rake 0 → 100644
    1 require 'open-uri'
    2 require 'csv'
    3 require 'zip'
    4
    5 namespace :crawler do
    6 desc 'importjob'
    7
    8 task jobs: :environment do
    9 parsed_page = Nokogiri::HTML(HTTParty.get('https://careerbuilder.vn/viec-lam/tat-ca-viec-lam-vi.html').body)
    10 jobs_item = parsed_page.css('div.job-item .job_link')
    11 (0..jobs_item.length - 1).each do |item|
    12 job_link = jobs_item[item].attribute('href').text
    • phuctmZigexn @phuctm commented Jul 13, 2021
      Master
      job_link = jobs_item[item].attribute('href').text
      job_page = Nokogiri::HTML(HTTParty.get(job_link).body)

      theo comment duyệt array ở trên thì em sửa lại chỗ job_link nha. Đối với những biến nào chỉ sử dụng có 1 lần, ví dụ như job_link thì nếu dòng code không quá dài (>100 kí tự) thì em gom vào 1 dòng luôn. anh ví dụ

      =>

      job_page = Nokogiri::HTML(HTTParty.get(item.attribute('href').text).body)
      ```ruby job_link = jobs_item[item].attribute('href').text job_page = Nokogiri::HTML(HTTParty.get(job_link).body) ``` theo comment duyệt array ở trên thì em sửa lại chỗ `job_link` nha. Đối với những biến nào chỉ sử dụng có 1 lần, ví dụ như job_link thì nếu dòng code không quá dài (>100 kí tự) thì em gom vào 1 dòng luôn. anh ví dụ => ``` job_page = Nokogiri::HTML(HTTParty.get(item.attribute('href').text).body) ```
    • Mai Hoang Thai Ha @hamht

      changed this line in version 6 of the diff

      Jul 13, 2021

      changed this line in version 6 of the diff

      changed this line in [version 6 of the diff](https://gitlab.zigexn.vn/hamht/VenJob/merge_requests/4/diffs?diff_id=5320&start_sha=ea11d8e64eaedefd6891cc4ab2835b8664598418#35317b89b9f7aa1703a6c40996f41a7210416841_12_9)
      Toggle commit list
    Please register or sign in to reply
  • phuctmZigexn
    @phuctm started a discussion on an old version of the diff Jul 13, 2021
    Last updated by Mai Hoang Thai Ha Jul 13, 2021
    lib/tasks/web_crawler.rake 0 → 100644
    2 require 'csv'
    3 require 'zip'
    4
    5 namespace :crawler do
    6 desc 'importjob'
    7
    8 task jobs: :environment do
    9 parsed_page = Nokogiri::HTML(HTTParty.get('https://careerbuilder.vn/viec-lam/tat-ca-viec-lam-vi.html').body)
    10 jobs_item = parsed_page.css('div.job-item .job_link')
    11 (0..jobs_item.length - 1).each do |item|
    12 job_link = jobs_item[item].attribute('href').text
    13 job_page = Nokogiri::HTML(HTTParty.get(job_link).body)
    14 job_desc = job_page.css('div.job-desc')
    15 job_detail = job_page.css('section.job-detail-content')
    16 # title - company
    17 title = job_desc.css('h1.title').text
    • phuctmZigexn @phuctm commented Jul 13, 2021
      Master

      thay vì em phải lấy job_desc rồi mới lấy job_title thì sao không lấy thẳng title luôn??? job_desc đâu có tác dụng gì đâu? job_title = job_page.css('div.job-desc .title').text

      tận dụng hết khả năng của CSS selector và XML để lấy đúng giá trị cần thiết luôn nha. Nokogiri làm được hết đó.

      thay vì em phải lấy `job_desc` rồi mới lấy `job_title` thì sao không lấy thẳng title luôn??? `job_desc` đâu có tác dụng gì đâu? `job_title = job_page.css('div.job-desc .title').text` tận dụng hết khả năng của CSS selector và XML để lấy đúng giá trị cần thiết luôn nha. Nokogiri làm được hết đó.
    • Mai Hoang Thai Ha @hamht

      changed this line in version 6 of the diff

      Jul 13, 2021

      changed this line in version 6 of the diff

      changed this line in [version 6 of the diff](https://gitlab.zigexn.vn/hamht/VenJob/merge_requests/4/diffs?diff_id=5320&start_sha=ea11d8e64eaedefd6891cc4ab2835b8664598418#35317b89b9f7aa1703a6c40996f41a7210416841_17_9)
      Toggle commit list
    Please register or sign in to reply
  • Mai Hoang Thai Ha @hamht

    added 1 commit

    • c657a9de - fix some bugs

    Compare with previous version

    Jul 13, 2021

    added 1 commit

    • c657a9de - fix some bugs

    Compare with previous version

    added 1 commit * c657a9de - fix some bugs [Compare with previous version](https://gitlab.zigexn.vn/hamht/VenJob/merge_requests/4/diffs?diff_id=5320&start_sha=ea11d8e64eaedefd6891cc4ab2835b8664598418)
    Toggle commit list
  • Mai Hoang Thai Ha @hamht

    added 2 commits

    • 35560bfd - Create migration, add option to model and install Active Record
    • 6aa95db8 - Merge branch 'Task/3_create_database_migration' into 'Task/6_create_crawler'

    Compare with previous version

    Jul 14, 2021

    added 2 commits

    • 35560bfd - Create migration, add option to model and install Active Record
    • 6aa95db8 - Merge branch 'Task/3_create_database_migration' into 'Task/6_create_crawler'

    Compare with previous version

    added 2 commits * 35560bfd - Create migration, add option to model and install Active Record * 6aa95db8 - Merge branch 'Task/3_create_database_migration' into 'Task/6_create_crawler' [Compare with previous version](https://gitlab.zigexn.vn/hamht/VenJob/merge_requests/4/diffs?diff_id=5322&start_sha=c657a9de989ce0cd1be29ce0898340bdbc907869)
    Toggle commit list
  • Mai Hoang Thai Ha @hamht

    added 1 commit

    • 2b3e5d23 - add argument to crawl 5 or all page ,merge branch migration, import data to City and Industry

    Compare with previous version

    Jul 14, 2021

    added 1 commit

    • 2b3e5d23 - add argument to crawl 5 or all page ,merge branch migration, import data to City and Industry

    Compare with previous version

    added 1 commit * 2b3e5d23 - add argument to crawl 5 or all page ,merge branch migration, import data to City and Industry [Compare with previous version](https://gitlab.zigexn.vn/hamht/VenJob/merge_requests/4/diffs?diff_id=5323&start_sha=6aa95db8286ae03449f0f499d8aecb20c93ea39b)
    Toggle commit list
  • Mai Hoang Thai Ha @hamht

    added 1 commit

    • 8ab617be - created and import job, company, industry, city to DB

    Compare with previous version

    Jul 15, 2021

    added 1 commit

    • 8ab617be - created and import job, company, industry, city to DB

    Compare with previous version

    added 1 commit * 8ab617be - created and import job, company, industry, city to DB [Compare with previous version](https://gitlab.zigexn.vn/hamht/VenJob/merge_requests/4/diffs?diff_id=5325&start_sha=2b3e5d233cf10f348474f78b17edee085facb565)
    Toggle commit list
  • Mai Hoang Thai Ha @hamht

    added 1 commit

    • 9a32f409 - fixed ARGV

    Compare with previous version

    Jul 19, 2021

    added 1 commit

    • 9a32f409 - fixed ARGV

    Compare with previous version

    added 1 commit * 9a32f409 - fixed ARGV [Compare with previous version](https://gitlab.zigexn.vn/hamht/VenJob/merge_requests/4/diffs?diff_id=5328&start_sha=8ab617be4cc1f6f1106579dc4c6fbb1816e91ea8)
    Toggle commit list
  • Mai Hoang Thai Ha @hamht

    added 3 commits

    • 9a32f409...ca4035a9 - 2 commits from branch master
    • f5f71e9f - Merge branch 'master' into 'Task/6_create_crawler'

    Compare with previous version

    Jul 20, 2021

    added 3 commits

    • 9a32f409...ca4035a9 - 2 commits from branch master
    • f5f71e9f - Merge branch 'master' into 'Task/6_create_crawler'

    Compare with previous version

    added 3 commits * 9a32f409...ca4035a9 - 2 commits from branch `master` * f5f71e9f - Merge branch 'master' into 'Task/6_create_crawler' [Compare with previous version](https://gitlab.zigexn.vn/hamht/VenJob/merge_requests/4/diffs?diff_id=5336&start_sha=9a32f409a392731b3208b1d3fdf8c66ea8b29144)
    Toggle commit list
  • Mai Hoang Thai Ha @hamht

    added 1 commit

    • cc6653d4 - fixed conflict

    Compare with previous version

    Jul 20, 2021

    added 1 commit

    • cc6653d4 - fixed conflict

    Compare with previous version

    added 1 commit * cc6653d4 - fixed conflict [Compare with previous version](https://gitlab.zigexn.vn/hamht/VenJob/merge_requests/4/diffs?diff_id=5337&start_sha=f5f71e9fdfc12c456f8ed4fae0728b716142471e)
    Toggle commit list
  • Thanh Hung Pham
    @hungpt started a discussion on an old version of the diff Jul 21, 2021
    Last updated by Mai Hoang Thai Ha Jul 21, 2021
    Gemfile
    57 57 # Windows does not include zoneinfo files, so bundle the tzinfo-data gem
    58 58 gem 'tzinfo-data', platforms: [:mingw, :mswin, :x64_mingw, :jruby]
    59 59 gem 'slim-rails', '~> 3.2'
    60 gem 'nokogiri', '~> 1.11', '>= 1.11.7'
    61 gem 'httparty', '~> 0.18.1'
    62 gem 'rubocop-rails', '~> 2.11', '>= 2.11.3'
    • Thanh Hung Pham @hungpt commented Jul 21, 2021
      Master

      @hamht Cái này mình chỉ dùng ở development nên move để ở trong group development nha em.

      @hamht Cái này mình chỉ dùng ở `development` nên move để ở trong group `development ` nha em.
    • Mai Hoang Thai Ha @hamht

      changed this line in version 14 of the diff

      Jul 21, 2021

      changed this line in version 14 of the diff

      changed this line in [version 14 of the diff](https://gitlab.zigexn.vn/hamht/VenJob/merge_requests/4/diffs?diff_id=5342&start_sha=cc6653d47fff5a754ce81ae8ad80604c6dabebc0#de3150c01c3a946a6168173c4116741379fe3579_62_63)
      Toggle commit list
    Please register or sign in to reply
  • Thanh Hung Pham
    @hungpt started a discussion on an old version of the diff Jul 21, 2021
    Last updated by Mai Hoang Thai Ha Jul 21, 2021
    lib/tasks/web_crawler.rake 0 → 100644
    1 require 'open-uri'
    2 require 'csv'
    • Thanh Hung Pham @hungpt commented Jul 21, 2021
      Master

      @hamht Chưa remove cái này hả em?

      require 'csv'
      require 'zip'
      @hamht Chưa remove cái này hả em? ``` require 'csv' require 'zip' ```
    • Mai Hoang Thai Ha @hamht

      changed this line in version 14 of the diff

      Jul 21, 2021

      changed this line in version 14 of the diff

      changed this line in [version 14 of the diff](https://gitlab.zigexn.vn/hamht/VenJob/merge_requests/4/diffs?diff_id=5342&start_sha=cc6653d47fff5a754ce81ae8ad80604c6dabebc0#35317b89b9f7aa1703a6c40996f41a7210416841_2_2)
      Toggle commit list
    Please register or sign in to reply
  • Thanh Hung Pham
    @hungpt started a discussion on an old version of the diff Jul 21, 2021
    Last updated by Mai Hoang Thai Ha Jul 26, 2021
    lib/tasks/web_crawler.rake 0 → 100644
    1 require 'open-uri'
    2 require 'csv'
    3 require 'zip'
    4
    5 namespace :crawler do
    6 desc 'crawler from CareerBuilder'
    7 task jobs: :environment do
    8 ARGV.each { |a| task a.to_sym { ; } }
    • Thanh Hung Pham @hungpt commented Jul 21, 2021
      Master

      @hamht Đã tìm ra mục đích của dòng code này chưa em?

      @hamht Đã tìm ra mục đích của dòng code này chưa em?
    • Mai Hoang Thai Ha @hamht commented Jul 21, 2021
      Master

      dạ vì nếu sử dụng cách này. vd ta chạy rake add 1 2 rails sẽ chạy: $ rake 1 $ rake 2 vì 1 và 2 không phải task, ta sẽ gặp lỗi trừ khi ta viết task rỗng cho nó

      dạ vì nếu sử dụng cách này. vd ta chạy `rake add 1 2` rails sẽ chạy: $ rake 1 $ rake 2 vì 1 và 2 không phải task, ta sẽ gặp lỗi trừ khi ta viết task rỗng cho nó
    • Mai Hoang Thai Ha @hamht

      changed this line in version 16 of the diff

      Jul 26, 2021

      changed this line in version 16 of the diff

      changed this line in [version 16 of the diff](https://gitlab.zigexn.vn/hamht/VenJob/merge_requests/4/diffs?diff_id=5358&start_sha=492fb2576c43c7b2c70440d4ccbc094d01262fc2#35317b89b9f7aa1703a6c40996f41a7210416841_6_7)
      Toggle commit list
    Please register or sign in to reply
  • Mai Hoang Thai Ha @hamht

    added 1 commit

    • 081533cc - move gem to development

    Compare with previous version

    Jul 21, 2021

    added 1 commit

    • 081533cc - move gem to development

    Compare with previous version

    added 1 commit * 081533cc - move gem to development [Compare with previous version](https://gitlab.zigexn.vn/hamht/VenJob/merge_requests/4/diffs?diff_id=5342&start_sha=cc6653d47fff5a754ce81ae8ad80604c6dabebc0)
    Toggle commit list
  • Mai Hoang Thai Ha @hamht

    added 1 commit

    • 492fb257 - fixed rails console

    Compare with previous version

    Jul 22, 2021

    added 1 commit

    • 492fb257 - fixed rails console

    Compare with previous version

    added 1 commit * 492fb257 - fixed rails console [Compare with previous version](https://gitlab.zigexn.vn/hamht/VenJob/merge_requests/4/diffs?diff_id=5350&start_sha=081533cc9b479ac3f11e98c337db8b11165e3ec3)
    Toggle commit list
  • Mai Hoang Thai Ha @hamht

    added 1 commit

    • 9e051c4d - fixed logic,...

    Compare with previous version

    Jul 26, 2021

    added 1 commit

    • 9e051c4d - fixed logic,...

    Compare with previous version

    added 1 commit * 9e051c4d - fixed logic,... [Compare with previous version](https://gitlab.zigexn.vn/hamht/VenJob/merge_requests/4/diffs?diff_id=5358&start_sha=492fb2576c43c7b2c70440d4ccbc094d01262fc2)
    Toggle commit list
  • Thanh Hung Pham
    @hungpt started a discussion on an old version of the diff Jul 26, 2021
    Last updated by Mai Hoang Thai Ha Jul 27, 2021
    lib/tasks/web_crawler.rake 0 → 100644
    104
    105 desc 'crawler industry form CareerBuilder'
    106 task industries: :environment do
    107 parsed_page = Nokogiri::HTML(HTTParty.get('https://careerbuilder.vn/tim-viec-lam.html').body)
    108 list_job = parsed_page.css('div.list-of-working-positions ul.list-jobs li a')
    109 list_job.each do |part|
    110 industry = part.text.squish.strip
    111 Industry.find_or_create_by(name: industry)
    112 end
    113 end
    114
    115 desc 'crawler city form CareerBuilder'
    116 task cities: :environment do
    117 parsed_page ||= Nokogiri::HTML(HTTParty.get('https://careerbuilder.vn/tim-viec-lam.html').body)
    118 list_location = parsed_page.css('div.main-jobs-by-location ul li')
    119 list_location.each do |part|
    • Thanh Hung Pham @hungpt commented Jul 26, 2021
      Master

      @hamht Đặt lại tên biến cho rõ ràng em. part nghĩa là gì á ?

      @hamht Đặt lại tên biến cho rõ ràng em. `part` nghĩa là gì á ?
    • Mai Hoang Thai Ha @hamht

      changed this line in version 17 of the diff

      Jul 27, 2021

      changed this line in version 17 of the diff

      changed this line in [version 17 of the diff](https://gitlab.zigexn.vn/hamht/VenJob/merge_requests/4/diffs?diff_id=5363&start_sha=9e051c4d6b5df656ec8ee56f491ff932755167ba#35317b89b9f7aa1703a6c40996f41a7210416841_119_113)
      Toggle commit list
    Please register or sign in to reply
  • Thanh Hung Pham
    @hungpt started a discussion on an old version of the diff Jul 26, 2021
    Last updated by Mai Hoang Thai Ha Jul 27, 2021
    lib/tasks/web_crawler.rake 0 → 100644
    106 task industries: :environment do
    107 parsed_page = Nokogiri::HTML(HTTParty.get('https://careerbuilder.vn/tim-viec-lam.html').body)
    108 list_job = parsed_page.css('div.list-of-working-positions ul.list-jobs li a')
    109 list_job.each do |part|
    110 industry = part.text.squish.strip
    111 Industry.find_or_create_by(name: industry)
    112 end
    113 end
    114
    115 desc 'crawler city form CareerBuilder'
    116 task cities: :environment do
    117 parsed_page ||= Nokogiri::HTML(HTTParty.get('https://careerbuilder.vn/tim-viec-lam.html').body)
    118 list_location = parsed_page.css('div.main-jobs-by-location ul li')
    119 list_location.each do |part|
    120 city_name = part.text
    121 region = 1
    • Thanh Hung Pham @hungpt commented Jul 26, 2021
      Master

      @hamht Giá trị region 1,0 nên đặt constant trong Model Region

      Edited Jul 26, 2021 by Thanh Hung Pham
      @hamht Giá trị `region 1,0` nên đặt constant trong Model Region
    • Mai Hoang Thai Ha @hamht

      changed this line in version 17 of the diff

      Jul 27, 2021

      changed this line in version 17 of the diff

      changed this line in [version 17 of the diff](https://gitlab.zigexn.vn/hamht/VenJob/merge_requests/4/diffs?diff_id=5363&start_sha=9e051c4d6b5df656ec8ee56f491ff932755167ba#35317b89b9f7aa1703a6c40996f41a7210416841_121_113)
      Toggle commit list
    Please register or sign in to reply
  • Thanh Hung Pham
    @hungpt started a discussion on an old version of the diff Jul 26, 2021
    Last updated by Mai Hoang Thai Ha Jul 27, 2021
    lib/tasks/web_crawler.rake 0 → 100644
    107 parsed_page = Nokogiri::HTML(HTTParty.get('https://careerbuilder.vn/tim-viec-lam.html').body)
    108 list_job = parsed_page.css('div.list-of-working-positions ul.list-jobs li a')
    109 list_job.each do |part|
    110 industry = part.text.squish.strip
    111 Industry.find_or_create_by(name: industry)
    112 end
    113 end
    114
    115 desc 'crawler city form CareerBuilder'
    116 task cities: :environment do
    117 parsed_page ||= Nokogiri::HTML(HTTParty.get('https://careerbuilder.vn/tim-viec-lam.html').body)
    118 list_location = parsed_page.css('div.main-jobs-by-location ul li')
    119 list_location.each do |part|
    120 city_name = part.text
    121 region = 1
    122 if city_name.include?(key = 'Việc làm tại')
    • Thanh Hung Pham @hungpt commented Jul 26, 2021
      Master

      @hamht Cần thiết thêm key = ở đây không ta? .include?('Việc làm tại')

      @hamht Cần thiết thêm `key =` ở đây không ta? `.include?('Việc làm tại')`
    • Mai Hoang Thai Ha @hamht

      changed this line in version 17 of the diff

      Jul 27, 2021

      changed this line in version 17 of the diff

      changed this line in [version 17 of the diff](https://gitlab.zigexn.vn/hamht/VenJob/merge_requests/4/diffs?diff_id=5363&start_sha=9e051c4d6b5df656ec8ee56f491ff932755167ba#35317b89b9f7aa1703a6c40996f41a7210416841_122_113)
      Toggle commit list
    Please register or sign in to reply
  • Thanh Hung Pham
    @hungpt started a discussion on an old version of the diff Jul 26, 2021
    Last updated by Mai Hoang Thai Ha Jul 27, 2021
    lib/tasks/web_crawler.rake 0 → 100644
    116 task cities: :environment do
    117 parsed_page ||= Nokogiri::HTML(HTTParty.get('https://careerbuilder.vn/tim-viec-lam.html').body)
    118 list_location = parsed_page.css('div.main-jobs-by-location ul li')
    119 list_location.each do |part|
    120 city_name = part.text
    121 region = 1
    122 if city_name.include?(key = 'Việc làm tại')
    123 city_name = city_name.remove(key).strip
    124 region = 0
    125 end
    126 city = {
    127 name: city_name,
    128 region: region
    129 }
    130 City.create(
    131 name: city[:name],
    • Thanh Hung Pham @hungpt commented Jul 26, 2021
      Master

      @hamht Refactor lại dùng trực tiếp biến city_name, region luôn đi em. Không cần tạo hash city

      @hamht Refactor lại dùng trực tiếp biến `city_name, region` luôn đi em. Không cần tạo hash `city`
    • Mai Hoang Thai Ha @hamht

      changed this line in version 17 of the diff

      Jul 27, 2021

      changed this line in version 17 of the diff

      changed this line in [version 17 of the diff](https://gitlab.zigexn.vn/hamht/VenJob/merge_requests/4/diffs?diff_id=5363&start_sha=9e051c4d6b5df656ec8ee56f491ff932755167ba#35317b89b9f7aa1703a6c40996f41a7210416841_131_123)
      Toggle commit list
    Please register or sign in to reply
  • Thanh Hung Pham
    @hungpt started a discussion on an old version of the diff Jul 26, 2021
    Last updated by Mai Hoang Thai Ha Jul 27, 2021
    lib/tasks/web_crawler.rake 0 → 100644
    71 company_object = Company.find_or_create_by(name: company_name)
    72 job_object = Job.create({ title: job_title,
    73 job_type: job_type,
    74 salary: salary,
    75 experience: experience,
    76 position: level,
    77 expiration_date: expiration_date,
    78 description: description,
    79 benefit: benefits,
    80 requirement: requirement,
    81 other_info: other_info,
    82 company_id: company_object.id,
    83 created_at: update_at,
    84 updated_at: update_at })
    85 industries.map do |industry|
    86 industry_objects = Industry.find_or_create_by(name: industry)
    • Thanh Hung Pham @hungpt commented Jul 26, 2021
      Master

      @hamht Chỗ này Industry.find_or_create_by(name: industry) trả về có 1 Industry à. Biến nên số ít chứ nhỉ? Tương tự city_objects

      @hamht Chỗ này `Industry.find_or_create_by(name: industry)` trả về có 1 Industry à. Biến nên số ít chứ nhỉ? Tương tự `city_objects`
    • Mai Hoang Thai Ha @hamht

      changed this line in version 17 of the diff

      Jul 27, 2021

      changed this line in version 17 of the diff

      changed this line in [version 17 of the diff](https://gitlab.zigexn.vn/hamht/VenJob/merge_requests/4/diffs?diff_id=5363&start_sha=9e051c4d6b5df656ec8ee56f491ff932755167ba#35317b89b9f7aa1703a6c40996f41a7210416841_86_82)
      Toggle commit list
    Please register or sign in to reply
  • Thanh Hung Pham
    @hungpt started a discussion on an old version of the diff Jul 26, 2021
    Last updated by Mai Hoang Thai Ha Jul 27, 2021
    lib/tasks/web_crawler.rake 0 → 100644
    72 job_object = Job.create({ title: job_title,
    73 job_type: job_type,
    74 salary: salary,
    75 experience: experience,
    76 position: level,
    77 expiration_date: expiration_date,
    78 description: description,
    79 benefit: benefits,
    80 requirement: requirement,
    81 other_info: other_info,
    82 company_id: company_object.id,
    83 created_at: update_at,
    84 updated_at: update_at })
    85 industries.map do |industry|
    86 industry_objects = Industry.find_or_create_by(name: industry)
    87 job_object.industries << industry_objects
    • Thanh Hung Pham @hungpt commented Jul 26, 2021
      Master

      @hamht Chỗ này nên loop xong rồi, mình << một lần vào DB. Giảm số lần connect đến DB. Tương tự job_object.cities < city_objects

      @hamht Chỗ này nên loop xong rồi, mình `<<` một lần vào DB. Giảm số lần connect đến DB. Tương tự `job_object.cities < city_objects`
    • Mai Hoang Thai Ha @hamht

      changed this line in version 17 of the diff

      Jul 27, 2021

      changed this line in version 17 of the diff

      changed this line in [version 17 of the diff](https://gitlab.zigexn.vn/hamht/VenJob/merge_requests/4/diffs?diff_id=5363&start_sha=9e051c4d6b5df656ec8ee56f491ff932755167ba#35317b89b9f7aa1703a6c40996f41a7210416841_87_82)
      Toggle commit list
    Please register or sign in to reply
  • Thanh Hung Pham
    @hungpt started a discussion on an old version of the diff Jul 26, 2021
    Last updated by Mai Hoang Thai Ha Jul 27, 2021
    lib/tasks/web_crawler.rake 0 → 100644
    68 company_name = job_page.css('div.job-desc a.job-company-name').text
    69 # Cities
    70 cities = job_page.css('.job-detail-content .detail-box .map p a').map(&:text)
    71 company_object = Company.find_or_create_by(name: company_name)
    72 job_object = Job.create({ title: job_title,
    73 job_type: job_type,
    74 salary: salary,
    75 experience: experience,
    76 position: level,
    77 expiration_date: expiration_date,
    78 description: description,
    79 benefit: benefits,
    80 requirement: requirement,
    81 other_info: other_info,
    82 company_id: company_object.id,
    83 created_at: update_at,
    • Thanh Hung Pham @hungpt commented Jul 26, 2021
      Master

      @hamht created_at và updated_at là 2 columns tự động cập nhật của Rails mà ta?

      @hamht `created_at` và `updated_at` là 2 columns tự động cập nhật của Rails mà ta?
    • Mai Hoang Thai Ha @hamht

      changed this line in version 17 of the diff

      Jul 27, 2021

      changed this line in version 17 of the diff

      changed this line in [version 17 of the diff](https://gitlab.zigexn.vn/hamht/VenJob/merge_requests/4/diffs?diff_id=5363&start_sha=9e051c4d6b5df656ec8ee56f491ff932755167ba#35317b89b9f7aa1703a6c40996f41a7210416841_83_82)
      Toggle commit list
    Please register or sign in to reply
  • phuctmZigexn
    @phuctm started a discussion on the diff Jul 27, 2021
    Last updated by phuctmZigexn Jul 27, 2021
    lib/tasks/web_crawler.rake 0 → 100644
    16 end
    17 (1..total_pages).each do |page|
    18 parsed_page = Nokogiri::HTML(HTTParty.get("https://careerbuilder.vn/viec-lam/tat-ca-viec-lam-trang-#{page}-vi.html").body)
    19 jobs_item = parsed_page.css('div.job-item .job_link')
    20 jobs_item.each do |item|
    21 retries ||= 0
    22 url ||= item.attribute('href').text
    23 job_page = Nokogiri::HTML(HTTParty.get(url).body)
    24 # Job
    25 job_title = job_page.css('div.job-desc h1.title').text
    26 # update_at, job_industries, job_type, salary, experience, level, expiration_date
    27 detail_box_items = job_page.css('.job-detail-content .detail-box ul li')
    28 # init
    29 update_at, job_type, salary, experience, level, expiration_date = ''
    30 industries = []
    31 detail_box_items.each do |info_item|
    • phuctmZigexn @phuctm commented Jul 27, 2021
      Master

      phần này anh thấy em làm tương tự cách bên dưới lấy data của benefits, description, requirement, other_info thì dễ nhìn hơn đó Hà

      phần này anh thấy em làm tương tự cách bên dưới lấy data của `benefits, description, requirement, other_info` thì dễ nhìn hơn đó Hà
    • phuctmZigexn @phuctm commented Jul 27, 2021
      Master

      code mẫu cho em

      detail_box_items.each do |info_item|
        key = info_item.css('strong').text.strip
        default_value = info_item.css('p').text.squish
        case key
        when 'Ngày cập nhật'
          update_at = default_value.to_time
        when 'Ngành nghề'
          industries = default_value.split(' , ')
        when 'Hình thức'
          job_type = default_value
        when 'Lương'
          salary = default_value
        when 'Kinh nghiệm'
          experience = default_value.squish
        when 'Cấp bậc'
          level = default_value
        when 'Hết hạn nộp'
          expiration_date = default_value.to_time
        end
      end
      code mẫu cho em ```ruby detail_box_items.each do |info_item| key = info_item.css('strong').text.strip default_value = info_item.css('p').text.squish case key when 'Ngày cập nhật' update_at = default_value.to_time when 'Ngành nghề' industries = default_value.split(' , ') when 'Hình thức' job_type = default_value when 'Lương' salary = default_value when 'Kinh nghiệm' experience = default_value.squish when 'Cấp bậc' level = default_value when 'Hết hạn nộp' expiration_date = default_value.to_time end end ```
    Please register or sign in to reply
  • Mai Hoang Thai Ha @hamht

    added 1 commit

    • a765decd - fixed review

    Compare with previous version

    Jul 27, 2021

    added 1 commit

    • a765decd - fixed review

    Compare with previous version

    added 1 commit * a765decd - fixed review [Compare with previous version](https://gitlab.zigexn.vn/hamht/VenJob/merge_requests/4/diffs?diff_id=5363&start_sha=9e051c4d6b5df656ec8ee56f491ff932755167ba)
    Toggle commit list
  • phuctmZigexn
    @phuctm started a discussion on the diff Jul 27, 2021
    lib/tasks/web_crawler.rake 0 → 100644
    37 when info.include?(key = 'Ngành nghề')
    38 industries = info.squish.remove(key).strip.split(' , ')
    39 when info.include?(key = 'Hình thức')
    40 job_type = info.squish.remove(key).strip
    41 when info.include?(key = 'Lương')
    42 salary = info.squish.remove(key).strip
    43 when info.include?(key = 'Kinh nghiệm')
    44 experience = info.squish.remove(key).strip
    45 when info.include?(key = 'Cấp bậc')
    46 level = info.squish.remove(key).strip
    47 when info.include?(key = 'Hết hạn nộp')
    48 expiration_date = info.squish.remove(key).strip.to_time
    49 end
    50 end
    51 # benefits, description, requirement, other_info
    52 job_detail_rows = job_page.css('section.job-detail-content div.detail-row')
    • phuctmZigexn @phuctm commented Jul 27, 2021
      Master

      phần lấy data này cần refactor lại nha. anh thấy dòng xử lý này áp dụng được cho cả 4 fields luôn detail_row.css(':not(h3.detail-title)').map(&:text).map(&:squish)[1..-1].reject(&:blank?).join('---')

      Don't Repeat Yourself

      phần lấy data này cần refactor lại nha. anh thấy dòng xử lý này áp dụng được cho cả 4 fields luôn `detail_row.css(':not(h3.detail-title)').map(&:text).map(&:squish)[1..-1].reject(&:blank?).join('---')` **Don't Repeat Yourself**
    Please register or sign in to reply
  • Thanh Hung Pham
    @hungpt started a discussion on an old version of the diff Jul 27, 2021
    Last updated by Mai Hoang Thai Ha Jul 27, 2021
    lib/tasks/web_crawler.rake 0 → 100644
    96 end
    97 end
    98
    99 desc 'crawler industry form CareerBuilder'
    100 task industries: :environment do
    101 parsed_page = Nokogiri::HTML(HTTParty.get('https://careerbuilder.vn/tim-viec-lam.html').body)
    102 list_job = parsed_page.css('div.list-of-working-positions ul.list-jobs li a')
    103 list_job.each do |part|
    104 industry = part.text.squish.strip
    105 Industry.find_or_create_by(name: industry)
    106 end
    107 end
    108
    109 desc 'crawler city form CareerBuilder'
    110 task cities: :environment do
    111 parsed_page ||= Nokogiri::HTML(HTTParty.get('https://careerbuilder.vn/tim-viec-lam.html').body)
    • Thanh Hung Pham @hungpt commented Jul 27, 2021
      Master

      @hamht chỗ sao dùng ||= vậy em?

      @hamht chỗ sao dùng `||=` vậy em?
    • Mai Hoang Thai Ha @hamht commented Jul 27, 2021
      Master

      vì sử dụng recuse nên dùng toán tử ||= đó anh ơi

      vì sử dụng recuse nên dùng toán tử `||=` đó anh ơi
    • Mai Hoang Thai Ha @hamht

      changed this line in version 20 of the diff

      Jul 27, 2021

      changed this line in version 20 of the diff

      changed this line in [version 20 of the diff](https://gitlab.zigexn.vn/hamht/VenJob/merge_requests/4/diffs?diff_id=5367&start_sha=8328deb390b5ab28f1bb7f29b3b8f1403777a1af#35317b89b9f7aa1703a6c40996f41a7210416841_113_119)
      Toggle commit list
    Please register or sign in to reply
  • Thanh Hung Pham
    @hungpt started a discussion on an old version of the diff Jul 27, 2021
    Last updated by Mai Hoang Thai Ha Jul 27, 2021
    lib/tasks/web_crawler.rake 0 → 100644
    100 task industries: :environment do
    101 parsed_page = Nokogiri::HTML(HTTParty.get('https://careerbuilder.vn/tim-viec-lam.html').body)
    102 list_job = parsed_page.css('div.list-of-working-positions ul.list-jobs li a')
    103 list_job.each do |part|
    104 industry = part.text.squish.strip
    105 Industry.find_or_create_by(name: industry)
    106 end
    107 end
    108
    109 desc 'crawler city form CareerBuilder'
    110 task cities: :environment do
    111 parsed_page ||= Nokogiri::HTML(HTTParty.get('https://careerbuilder.vn/tim-viec-lam.html').body)
    112 list_location = parsed_page.css('div.main-jobs-by-location ul li')
    113 list_location.each do |city|
    114 city_name = city.text
    115 region = City.regions[:international]
    • Thanh Hung Pham @hungpt commented Jul 27, 2021
      Master

      @hamht Chỗ này mình có thể dùng ngắn gọn hơn region = :international nó vẫn hiểu nha em.

      @hamht Chỗ này mình có thể dùng ngắn gọn hơn `region = :international` nó vẫn hiểu nha em.
    • Mai Hoang Thai Ha @hamht

      changed this line in version 18 of the diff

      Jul 27, 2021

      changed this line in version 18 of the diff

      changed this line in [version 18 of the diff](https://gitlab.zigexn.vn/hamht/VenJob/merge_requests/4/diffs?diff_id=5364&start_sha=a765decd705d624243fe5de707578746c7d9e92e#35317b89b9f7aa1703a6c40996f41a7210416841_115_116)
      Toggle commit list
    Please register or sign in to reply
  • Thanh Hung Pham
    @hungpt started a discussion on an old version of the diff Jul 27, 2021
    Last updated by Mai Hoang Thai Ha Jul 27, 2021
    lib/tasks/web_crawler.rake 0 → 100644
    105 Industry.find_or_create_by(name: industry)
    106 end
    107 end
    108
    109 desc 'crawler city form CareerBuilder'
    110 task cities: :environment do
    111 parsed_page ||= Nokogiri::HTML(HTTParty.get('https://careerbuilder.vn/tim-viec-lam.html').body)
    112 list_location = parsed_page.css('div.main-jobs-by-location ul li')
    113 list_location.each do |city|
    114 city_name = city.text
    115 region = City.regions[:international]
    116 if city_name.include?('Việc làm tại')
    117 city_name = city_name.remove('Việc làm tại').strip
    118 region = City.regions[:vietnam]
    119 end
    120 City.create(
    • Thanh Hung Pham @hungpt commented Jul 27, 2021
      Master

      @hamht Không dùng find_or_create_by nữa hả em? Case chạy rake nhiều lần, sẽ bị duplicate dữ liệu không?

      @hamht Không dùng `find_or_create_by` nữa hả em? Case chạy rake nhiều lần, sẽ bị duplicate dữ liệu không?
    • Mai Hoang Thai Ha @hamht

      changed this line in version 18 of the diff

      Jul 27, 2021

      changed this line in version 18 of the diff

      changed this line in [version 18 of the diff](https://gitlab.zigexn.vn/hamht/VenJob/merge_requests/4/diffs?diff_id=5364&start_sha=a765decd705d624243fe5de707578746c7d9e92e#35317b89b9f7aa1703a6c40996f41a7210416841_120_121)
      Toggle commit list
    Please register or sign in to reply
  • Thanh Hung Pham
    @hungpt started a discussion on an old version of the diff Jul 27, 2021
    Last updated by Mai Hoang Thai Ha Jul 27, 2021
    lib/tasks/web_crawler.rake 0 → 100644
    57 when 'Phúc lợi'
    58 benefits = detail_row.css(':not(h3.detail-title)').map(&:text).map(&:squish)[1..-1].reject(&:blank?).join('---')
    59 when 'Mô tả Công việc'
    60 description = detail_row.css(':not(h3.detail-title)').map(&:text).map(&:squish)[1..-1].reject(&:blank?).join('---')
    61 when 'Yêu Cầu Công Việc'
    62 requirement = detail_row.css(':not(h3.detail-title)').map(&:text).map(&:squish)[1..-1].reject(&:blank?).join('---')
    63 when 'Thông tin khác'
    64 other_info = detail_row.css(':not(h3.detail-title)').map(&:text).map(&:squish)[1..-1].reject(&:blank?).join('---')
    65 end
    66 end
    67 # Company
    68 company_name = job_page.css('div.job-desc a.job-company-name').text
    69 # Cities
    70 cities = job_page.css('.job-detail-content .detail-box .map p a').map(&:text)
    71 company_object = Company.find_or_create_by(name: company_name)
    72 job_object = Job.create({ title: job_title,
    • Thanh Hung Pham @hungpt commented Jul 27, 2021
      Master

      @hamht Không dùng find_or_create_by nữa hả em? Case chạy rake nhiều lần, sẽ bị duplicate dữ liệu không?

      @hamht Không dùng `find_or_create_by` nữa hả em? Case chạy rake nhiều lần, sẽ bị duplicate dữ liệu không?
    • Mai Hoang Thai Ha @hamht

      changed this line in version 18 of the diff

      Jul 27, 2021

      changed this line in version 18 of the diff

      changed this line in [version 18 of the diff](https://gitlab.zigexn.vn/hamht/VenJob/merge_requests/4/diffs?diff_id=5364&start_sha=a765decd705d624243fe5de707578746c7d9e92e#35317b89b9f7aa1703a6c40996f41a7210416841_72_71)
      Toggle commit list
    Please register or sign in to reply
  • Thanh Hung Pham
    @hungpt started a discussion on an old version of the diff Jul 27, 2021
    Last updated by Mai Hoang Thai Ha Jul 27, 2021
    lib/tasks/web_crawler.rake 0 → 100644
    55 detail_title = detail_row.css('.detail-title').text.strip
    56 case detail_title
    57 when 'Phúc lợi'
    58 benefits = detail_row.css(':not(h3.detail-title)').map(&:text).map(&:squish)[1..-1].reject(&:blank?).join('---')
    59 when 'Mô tả Công việc'
    60 description = detail_row.css(':not(h3.detail-title)').map(&:text).map(&:squish)[1..-1].reject(&:blank?).join('---')
    61 when 'Yêu Cầu Công Việc'
    62 requirement = detail_row.css(':not(h3.detail-title)').map(&:text).map(&:squish)[1..-1].reject(&:blank?).join('---')
    63 when 'Thông tin khác'
    64 other_info = detail_row.css(':not(h3.detail-title)').map(&:text).map(&:squish)[1..-1].reject(&:blank?).join('---')
    65 end
    66 end
    67 # Company
    68 company_name = job_page.css('div.job-desc a.job-company-name').text
    69 # Cities
    70 cities = job_page.css('.job-detail-content .detail-box .map p a').map(&:text)
    • Thanh Hung Pham @hungpt commented Jul 27, 2021
      Master

      @hamht Nên move dòng này lại kế bên Line 85. Cho dễ hiểu em.

      @hamht Nên move dòng này lại kế bên Line 85. Cho dễ hiểu em.
    • Mai Hoang Thai Ha @hamht

      changed this line in version 18 of the diff

      Jul 27, 2021

      changed this line in version 18 of the diff

      changed this line in [version 18 of the diff](https://gitlab.zigexn.vn/hamht/VenJob/merge_requests/4/diffs?diff_id=5364&start_sha=a765decd705d624243fe5de707578746c7d9e92e#35317b89b9f7aa1703a6c40996f41a7210416841_70_70)
      Toggle commit list
    Please register or sign in to reply
  • phuctmZigexn
    @phuctm started a discussion on an old version of the diff Jul 27, 2021
    Last updated by Mai Hoang Thai Ha Jul 27, 2021
    lib/tasks/web_crawler.rake 0 → 100644
    38 industries = info.squish.remove(key).strip.split(' , ')
    39 when info.include?(key = 'Hình thức')
    40 job_type = info.squish.remove(key).strip
    41 when info.include?(key = 'Lương')
    42 salary = info.squish.remove(key).strip
    43 when info.include?(key = 'Kinh nghiệm')
    44 experience = info.squish.remove(key).strip
    45 when info.include?(key = 'Cấp bậc')
    46 level = info.squish.remove(key).strip
    47 when info.include?(key = 'Hết hạn nộp')
    48 expiration_date = info.squish.remove(key).strip.to_time
    49 end
    50 end
    51 # benefits, description, requirement, other_info
    52 job_detail_rows = job_page.css('section.job-detail-content div.detail-row')
    53 benefits, description, requirement, other_info = []
    • phuctmZigexn @phuctm commented Jul 27, 2021
      Master

      benefits, description, requirement, other_info trong DB em để 4 cột này ở dạng text thì sao ở đây em để là [], lúc join vào là thành string rồi mà

      => lưu vào DB thì value của 4 cột này có đúng ko?

      `benefits, description, requirement, other_info` trong DB em để 4 cột này ở dạng `text` thì sao ở đây em để là `[]`, lúc join vào là thành string rồi mà => lưu vào DB thì value của 4 cột này có đúng ko?
    • phuctmZigexn @phuctm commented Jul 27, 2021
      Master

      mà anh thấy data lấy về của 4 cột này có vẻ sai sai :)))

      mà anh thấy data lấy về của 4 cột này có vẻ sai sai :)))
    • Mai Hoang Thai Ha @hamht

      changed this line in version 18 of the diff

      Jul 27, 2021

      changed this line in version 18 of the diff

      changed this line in [version 18 of the diff](https://gitlab.zigexn.vn/hamht/VenJob/merge_requests/4/diffs?diff_id=5364&start_sha=a765decd705d624243fe5de707578746c7d9e92e#35317b89b9f7aa1703a6c40996f41a7210416841_53_53)
      Toggle commit list
    Please register or sign in to reply
  • Mai Hoang Thai Ha @hamht

    added 1 commit

    • cdd2eeb9 - fixed review part 2

    Compare with previous version

    Jul 27, 2021

    added 1 commit

    • cdd2eeb9 - fixed review part 2

    Compare with previous version

    added 1 commit * cdd2eeb9 - fixed review part 2 [Compare with previous version](https://gitlab.zigexn.vn/hamht/VenJob/merge_requests/4/diffs?diff_id=5364&start_sha=a765decd705d624243fe5de707578746c7d9e92e)
    Toggle commit list
  • Mai Hoang Thai Ha @hamht

    added 1 commit

    • 8328deb3 - fixed review part 3

    Compare with previous version

    Jul 27, 2021

    added 1 commit

    • 8328deb3 - fixed review part 3

    Compare with previous version

    added 1 commit * 8328deb3 - fixed review part 3 [Compare with previous version](https://gitlab.zigexn.vn/hamht/VenJob/merge_requests/4/diffs?diff_id=5366&start_sha=cdd2eeb98a07087623aab383a374b88013127718)
    Toggle commit list
  • Mai Hoang Thai Ha @hamht

    added 1 commit

    • b8abb807 - add logger

    Compare with previous version

    Jul 27, 2021

    added 1 commit

    • b8abb807 - add logger

    Compare with previous version

    added 1 commit * b8abb807 - add logger [Compare with previous version](https://gitlab.zigexn.vn/hamht/VenJob/merge_requests/4/diffs?diff_id=5367&start_sha=8328deb390b5ab28f1bb7f29b3b8f1403777a1af)
    Toggle commit list
  • Mai Hoang Thai Ha @hamht

    added 1 commit

    • c0f80e93 - used start_with, next if empty field

    Compare with previous version

    Jul 28, 2021

    added 1 commit

    • c0f80e93 - used start_with, next if empty field

    Compare with previous version

    added 1 commit * c0f80e93 - used start_with, next if empty field [Compare with previous version](https://gitlab.zigexn.vn/hamht/VenJob/merge_requests/4/diffs?diff_id=5371&start_sha=b8abb8075b48a186222bd1bb7bd07c2748abceea)
    Toggle commit list
  • Mai Hoang Thai Ha @hamht

    added 1 commit

    • 135b4c82 - add logger to empty job title

    Compare with previous version

    Jul 28, 2021

    added 1 commit

    • 135b4c82 - add logger to empty job title

    Compare with previous version

    added 1 commit * 135b4c82 - add logger to empty job title [Compare with previous version](https://gitlab.zigexn.vn/hamht/VenJob/merge_requests/4/diffs?diff_id=5372&start_sha=c0f80e938906e7e3ca43a3217864f473cacd00a9)
    Toggle commit list
  • phuctmZigexn @phuctm

    mentioned in commit d71790ab

    Jul 28, 2021

    mentioned in commit d71790ab

    mentioned in commit d71790abddabcc1111128954b10a2eace61d5e7b
    Toggle commit list
  • phuctmZigexn @phuctm

    merged

    Jul 28, 2021

    merged

    merged
    Toggle commit list
  • Write
  • Preview
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or sign in to comment
Assignee
No assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
3
3 participants
Reference: hamht/VenJob!4
×

Revert this merge request

Switch branch
Cancel
A new branch will be created in your fork and a new merge request will be started.
×

Cherry-pick this merge request

Switch branch
Cancel
A new branch will be created in your fork and a new merge request will be started.