Skip to content

  • Projects
  • Groups
  • Snippets
  • Help
  • This project
    • Loading...
  • Sign in / Register
V
VeNJOB
  • Overview
    • Overview
    • Details
    • Activity
    • Cycle Analytics
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Charts
  • Issues 0
    • Issues 0
    • List
    • Board
    • Labels
    • Milestones
  • Merge Requests 0
    • Merge Requests 0
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
    • Charts
  • Wiki
    • Wiki
  • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Charts
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
  • Nguyen Ngoc Nghia
  • VeNJOB
  • Merge Requests
  • !21

Merged
Opened Feb 20, 2020 by Nguyen Ngoc Nghia@nghiann 
  • Report abuse
Report abuse

Feature/crawl data

×

Check out, review, and merge locally

Step 1. Fetch and check out the branch for this merge request

git fetch origin
git checkout -b feature/crawl_data origin/feature/crawl_data

Step 2. Review the changes locally

Step 3. Merge the branch and fix any conflicts that come up

git checkout master
git merge --no-ff feature/crawl_data

Step 4. Push the result of the merge to GitLab

git push origin master

Note that pushing to GitLab requires write access to this repository.

Tip: You can also checkout merge requests locally by following these guidelines.

  • Discussion 10
  • Commits 11
  • Changes 8
{{ resolvedDiscussionCount }}/{{ discussionCount }} {{ resolvedCountText }} resolved
  • Nguyen Ngoc Nghia @nghiann

    added 2 commits

    • 0ba64d74 - fix job_workplace nil
    • b05cfa0e - create crawl log

    Compare with previous version

    Feb 20, 2020

    added 2 commits

    • 0ba64d74 - fix job_workplace nil
    • b05cfa0e - create crawl log

    Compare with previous version

    added 2 commits * 0ba64d74 - fix job_workplace nil * b05cfa0e - create crawl log [Compare with previous version](https://gitlab.zigexn.vn/nghiann/VeNJOB/merge_requests/21/diffs?diff_id=4710&start_sha=654d7a1c185a8cff123dc2905c3ef7644b6ae9e2)
    Toggle commit list
  • Nguyen Ngoc Nghia @nghiann

    added 1 commit

    • 032a9417 - delete unused file

    Compare with previous version

    Feb 20, 2020

    added 1 commit

    • 032a9417 - delete unused file

    Compare with previous version

    added 1 commit * 032a9417 - delete unused file [Compare with previous version](https://gitlab.zigexn.vn/nghiann/VeNJOB/merge_requests/21/diffs?diff_id=4711&start_sha=b05cfa0e778e9c16b27e84e03a5011cb1882dd20)
    Toggle commit list
  • Van Hau Le
    @haulv started a discussion on an old version of the diff Feb 21, 2020
    Automatically resolved by Nguyen Ngoc Nghia with a push Feb 21, 2020
    app/services/crawl_data.rb 0 → 100644
    1 require "nokogiri"
    2 require "open-uri"
    3 require "resolv-replace"
    4
    5 class CrawlData
    6 def initialize
    7
    • Van Hau Le @haulv commented Feb 21, 2020
      Master

      @nghiann ko có pass param thì có thể bỏ này đi nha em!

      @nghiann ko có pass param thì có thể bỏ này đi nha em!
    • Nguyen Ngoc Nghia @nghiann

      changed this line in version 4 of the diff

      Feb 21, 2020

      changed this line in version 4 of the diff

      changed this line in [version 4 of the diff](https://gitlab.zigexn.vn/nghiann/VeNJOB/merge_requests/21/diffs?diff_id=4712&start_sha=032a9417bca7869b455ca87675a7395a96322955#8894a7593148633660686edad184d069ab39ae12_7_6)
      Toggle commit list
    Please register or sign in to reply
  • Nguyen Ngoc Nghia @nghiann

    resolved all discussions

    Feb 21, 2020

    resolved all discussions

    resolved all discussions
    Toggle commit list
  • Nguyen Ngoc Nghia @nghiann

    added 1 commit

    • 0f713e18 - delete unused initialize method

    Compare with previous version

    Feb 21, 2020

    added 1 commit

    • 0f713e18 - delete unused initialize method

    Compare with previous version

    added 1 commit * 0f713e18 - delete unused initialize method [Compare with previous version](https://gitlab.zigexn.vn/nghiann/VeNJOB/merge_requests/21/diffs?diff_id=4712&start_sha=032a9417bca7869b455ca87675a7395a96322955)
    Toggle commit list
  • Nguyen Ngoc Nghia @nghiann

    added 1 commit

    • 2d0503f4 - remove unnecessary html tag

    Compare with previous version

    Feb 21, 2020

    added 1 commit

    • 2d0503f4 - remove unnecessary html tag

    Compare with previous version

    added 1 commit * 2d0503f4 - remove unnecessary html tag [Compare with previous version](https://gitlab.zigexn.vn/nghiann/VeNJOB/merge_requests/21/diffs?diff_id=4713&start_sha=0f713e18d02ff910f86bf704f83e4c976072284e)
    Toggle commit list
  • Nguyen Ngoc Nghia @nghiann

    added 1 commit

    • cb600a23 - refactoring job crawler code

    Compare with previous version

    Feb 28, 2020

    added 1 commit

    • cb600a23 - refactoring job crawler code

    Compare with previous version

    added 1 commit * cb600a23 - refactoring job crawler code [Compare with previous version](https://gitlab.zigexn.vn/nghiann/VeNJOB/merge_requests/21/diffs?diff_id=4717&start_sha=2d0503f46a8a8fe3e8c29668ec276387c65c91fe)
    Toggle commit list
  • Van Hau Le
    @haulv started a discussion on the diff Feb 28, 2020
    Resolved by Nguyen Ngoc Nghia Mar 02, 2020
    app/services/crawl_data.rb 0 → 100644
    1 require "nokogiri"
    2 require "open-uri"
    3 require "resolv-replace"
    4 require "openssl"
    5 OpenSSL::SSL::VERIFY_PEER = OpenSSL::SSL::VERIFY_NONE
    6 class CrawlData
    7 def crawl_web
    8 page = Nokogiri::HTML.parse(open(Settings.crawl.base_url, ssl_verify_mode: nil))
    9 total_job = page.css("div.ais-stats h1.col-sm-10 span").text.gsub(",", "").to_f
    • Van Hau Le @haulv commented Feb 28, 2020
      Master

      @nghiann return if total_job = 0

      Edited Mar 02, 2020
      @nghiann return if total_job = 0
    Please register or sign in to reply
  • Van Hau Le
    @haulv started a discussion on an old version of the diff Feb 28, 2020
    Automatically resolved by Nguyen Ngoc Nghia with a push Mar 02, 2020
    app/services/crawl_data.rb 0 → 100644
    12 crawl_job_title_logger.info "Crawl at #{Time.current}"
    13
    14 (1..Settings.crawl.fixed_total_page).each do |each_page|
    15 page = Nokogiri::HTML.parse(open(URI.encode("https://careerbuilder.vn/viec-lam/tat-ca-viec-lam-trang-#{each_page}-vi.html")))
    16 (0..49).each do |j|
    17 job_url = page.css(".jobtitle h3 a @href")[j].text
    18
    19 job_page = Nokogiri::HTML.parse(open(URI.encode(job_url)))
    20
    21 # Job code
    22 job_code = job_url.split("/").last.split(".")[-2]
    23
    24 # Company code
    25 company_code = job_url.split("/").last.split("-").last.split(".")[-2].strip
    26
    27 next if job_page.css(".LeftJobCB").nil?
    • Van Hau Le @haulv commented Feb 28, 2020
      Master

      @nghiann move it to under job_page = Nokogiri::HTML.parse(open(URI.encode(job_url)))

      @nghiann move it to under `job_page = Nokogiri::HTML.parse(open(URI.encode(job_url)))`
    • Nguyen Ngoc Nghia @nghiann

      changed this line in version 8 of the diff

      Mar 02, 2020

      changed this line in version 8 of the diff

      changed this line in [version 8 of the diff](https://gitlab.zigexn.vn/nghiann/VeNJOB/merge_requests/21/diffs?diff_id=4721&start_sha=6e007fe368f324e74dbe0feb642046561e6be978#8894a7593148633660686edad184d069ab39ae12_27_26)
      Toggle commit list
    Please register or sign in to reply
  • Van Hau Le
    @haulv started a discussion on an old version of the diff Feb 28, 2020
    Automatically resolved by Nguyen Ngoc Nghia with a push Mar 02, 2020
    app/services/crawl_data.rb 0 → 100644
    18
    19 job_page = Nokogiri::HTML.parse(open(URI.encode(job_url)))
    20
    21 # Job code
    22 job_code = job_url.split("/").last.split(".")[-2]
    23
    24 # Company code
    25 company_code = job_url.split("/").last.split("-").last.split(".")[-2].strip
    26
    27 next if job_page.css(".LeftJobCB").nil?
    28
    29 job = JobHtml.new(job_page).parse_job
    30
    31 crawl_job_title_logger.info "#{job[:title]}"
    32
    33 next if job[:workplace].nil?
    • Van Hau Le @haulv commented Feb 28, 2020
      Master

      @nghiann next if job[:workplace].blank?

      @nghiann `next if job[:workplace].blank?`
    • Nguyen Ngoc Nghia @nghiann

      changed this line in version 8 of the diff

      Mar 02, 2020

      changed this line in version 8 of the diff

      changed this line in [version 8 of the diff](https://gitlab.zigexn.vn/nghiann/VeNJOB/merge_requests/21/diffs?diff_id=4721&start_sha=6e007fe368f324e74dbe0feb642046561e6be978#8894a7593148633660686edad184d069ab39ae12_33_35)
      Toggle commit list
    Please register or sign in to reply
  • Van Hau Le
    @haulv started a discussion on an old version of the diff Feb 28, 2020
    Automatically resolved by Nguyen Ngoc Nghia with a push Mar 02, 2020
    app/services/crawl_data.rb 0 → 100644
    22 job_code = job_url.split("/").last.split(".")[-2]
    23
    24 # Company code
    25 company_code = job_url.split("/").last.split("-").last.split(".")[-2].strip
    26
    27 next if job_page.css(".LeftJobCB").nil?
    28
    29 job = JobHtml.new(job_page).parse_job
    30
    31 crawl_job_title_logger.info "#{job[:title]}"
    32
    33 next if job[:workplace].nil?
    34
    35 job[:workplace].each do |city_name|
    36 city_id = city_id(city_name)
    37 company_id = company_id(company_code, job[:company_name], job[:company_address], job[:company_description])
    • Van Hau Le @haulv commented Feb 28, 2020
      Master

      @nghiann company = get_company(company_code, job[:company_name], job[:company_address], job[:company_description])

      def get_company(code, name, address, description) company = Company.find_or_initialize_by(code: code) company.update(name: name, address: address, description: description) company end

      @nghiann ` company = get_company(company_code, job[:company_name], job[:company_address], job[:company_description])` `def get_company(code, name, address, description) company = Company.find_or_initialize_by(code: code) company.update(name: name, address: address, description: description) company end `
    • Nguyen Ngoc Nghia @nghiann

      changed this line in version 8 of the diff

      Mar 02, 2020

      changed this line in version 8 of the diff

      changed this line in [version 8 of the diff](https://gitlab.zigexn.vn/nghiann/VeNJOB/merge_requests/21/diffs?diff_id=4721&start_sha=6e007fe368f324e74dbe0feb642046561e6be978#8894a7593148633660686edad184d069ab39ae12_37_36)
      Toggle commit list
    Please register or sign in to reply
  • Van Hau Le
    @haulv started a discussion on an old version of the diff Feb 28, 2020
    Automatically resolved by Nguyen Ngoc Nghia with a push Mar 02, 2020
    app/services/crawl_data.rb 0 → 100644
    23
    24 # Company code
    25 company_code = job_url.split("/").last.split("-").last.split(".")[-2].strip
    26
    27 next if job_page.css(".LeftJobCB").nil?
    28
    29 job = JobHtml.new(job_page).parse_job
    30
    31 crawl_job_title_logger.info "#{job[:title]}"
    32
    33 next if job[:workplace].nil?
    34
    35 job[:workplace].each do |city_name|
    36 city_id = city_id(city_name)
    37 company_id = company_id(company_code, job[:company_name], job[:company_address], job[:company_description])
    38 job_id = job_id(job_code, job[:title], job[:salary],
    • Van Hau Le @haulv commented Feb 28, 2020
      Master

      @nghiann the same with above comment (get_company)

      @nghiann the same with above comment (get_company)
    • Nguyen Ngoc Nghia @nghiann

      changed this line in version 8 of the diff

      Mar 02, 2020

      changed this line in version 8 of the diff

      changed this line in [version 8 of the diff](https://gitlab.zigexn.vn/nghiann/VeNJOB/merge_requests/21/diffs?diff_id=4721&start_sha=6e007fe368f324e74dbe0feb642046561e6be978#8894a7593148633660686edad184d069ab39ae12_38_36)
      Toggle commit list
    Please register or sign in to reply
  • Van Hau Le
    @haulv started a discussion on an old version of the diff Feb 28, 2020
    Automatically resolved by Nguyen Ngoc Nghia with a push Mar 02, 2020
    app/services/crawl_data.rb 0 → 100644
    59 def industry_id(name)
    60 industry = Industry.find_or_create_by!(name: name)
    61 industry.id
    62 end
    63
    64 def city_id(name)
    65 name = name.strip
    66 City.find_or_create_by(name: name, region: "Việt Nam").id
    67 end
    68
    69 def job_id(code = nil, title, salary, description, requirement, level, post_date, expiration_date, company_id)
    70 if expiration_date.nil?
    71 job = Job.find_or_initialize_by(title: job_title, company_id: company_id)
    72 else
    73 job = Job.find_or_initialize_by(code: code)
    74 end
    • Van Hau Le @haulv commented Feb 28, 2020
      Master

      @nghiann

      attrs = expiration_date.nil? ? {title: job_title, company_id: company_id} : {code: code}
      job = Job.find_or_initialize_by attrs
      @nghiann ``` attrs = expiration_date.nil? ? {title: job_title, company_id: company_id} : {code: code} job = Job.find_or_initialize_by attrs ```
    • Nguyen Ngoc Nghia @nghiann

      changed this line in version 8 of the diff

      Mar 02, 2020

      changed this line in version 8 of the diff

      changed this line in [version 8 of the diff](https://gitlab.zigexn.vn/nghiann/VeNJOB/merge_requests/21/diffs?diff_id=4721&start_sha=6e007fe368f324e74dbe0feb642046561e6be978#8894a7593148633660686edad184d069ab39ae12_74_70)
      Toggle commit list
    Please register or sign in to reply
  • Nguyen Ngoc Nghia @nghiann

    added 1 commit

    • 6e007fe3 - view truncate description

    Compare with previous version

    Feb 28, 2020

    added 1 commit

    • 6e007fe3 - view truncate description

    Compare with previous version

    added 1 commit * 6e007fe3 - view truncate description [Compare with previous version](https://gitlab.zigexn.vn/nghiann/VeNJOB/merge_requests/21/diffs?diff_id=4718&start_sha=cb600a2329a09383301de9defabb57afa3fc682d)
    Toggle commit list
  • Nguyen Ngoc Nghia @nghiann

    added 1 commit

    • 37105061 - refactoring code

    Compare with previous version

    Mar 02, 2020

    added 1 commit

    • 37105061 - refactoring code

    Compare with previous version

    added 1 commit * 37105061 - refactoring code [Compare with previous version](https://gitlab.zigexn.vn/nghiann/VeNJOB/merge_requests/21/diffs?diff_id=4721&start_sha=6e007fe368f324e74dbe0feb642046561e6be978)
    Toggle commit list
  • Nguyen Ngoc Nghia @nghiann

    resolved all discussions

    Mar 02, 2020

    resolved all discussions

    resolved all discussions
    Toggle commit list
  • Van Hau Le
    @haulv started a discussion on an old version of the diff Mar 02, 2020
    Automatically resolved by Nguyen Ngoc Nghia with a push Mar 02, 2020
    app/services/crawl_data.rb 0 → 100644
    27 job_code = job_url.split("/").last.split(".")[-2] || ""
    28
    29 # Company code
    30 company_code = job_page.css(".viewmorejob a @href").present? ?
    31 job_page.css(".viewmorejob a @href").text.split("/").last.split("-")[-2].strip : ""
    32
    33 crawl_job_title_logger.info "#{job[:title]}"
    34
    35 job[:workplace].each do |city_name|
    36 city_id = get_city(city_name).id
    37 company_id = get_company(company_code, job[:company_name], job[:company_address], job[:company_description]).id
    38 job_id = get_job(job_code, job[:title], job[:salary],
    39 job[:description], job[:requirement],
    40 job[:level], job[:post_date],
    41 job[:expiration_date], company_id).id
    42 CityJob.find_or_create_by!(job_id: job_id, city_id: city_id)
    • Van Hau Le @haulv commented Mar 02, 2020
      Master

      @nghiann

      job[company_id] = company_id
      saved_job = save_job(job)
      CityJob.find_or_create_by!(job_id: job.id, city_id: city_id)
      Edited Mar 02, 2020 by Van Hau Le
      @nghiann ```ruby job[company_id] = company_id saved_job = save_job(job) CityJob.find_or_create_by!(job_id: job.id, city_id: city_id) ```
    • Nguyen Ngoc Nghia @nghiann

      changed this line in version 10 of the diff

      Mar 02, 2020

      changed this line in version 10 of the diff

      changed this line in [version 10 of the diff](https://gitlab.zigexn.vn/nghiann/VeNJOB/merge_requests/21/diffs?diff_id=4723&start_sha=ab38f5e464fd6edd476570a992f97e5a2809859e#8894a7593148633660686edad184d069ab39ae12_42_42)
      Toggle commit list
    Please register or sign in to reply
  • Van Hau Le
    @haulv started a discussion on an old version of the diff Mar 02, 2020
    Automatically resolved by Nguyen Ngoc Nghia with a push Mar 02, 2020
    app/services/crawl_data.rb 0 → 100644
    55 company = Company.find_or_initialize_by(code: code)
    56 company.update(name: name, address: address, description: description)
    57 company
    58 end
    59
    60 def get_industry(name)
    61 industry = Industry.find_or_create_by!(name: name)
    62 industry
    63 end
    64
    65 def get_city(name)
    66 name = name.strip
    67 City.find_or_create_by(name: name, region: "Việt Nam")
    68 end
    69
    70 def get_job(code = nil, title, salary, description, requirement, level, post_date, expiration_date, company_id)
    • Van Hau Le @haulv commented Mar 02, 2020
      Master

      @nghiann a lot of params!

      @nghiann a lot of params!
    • Nguyen Ngoc Nghia @nghiann

      changed this line in version 9 of the diff

      Mar 02, 2020

      changed this line in version 9 of the diff

      changed this line in [version 9 of the diff](https://gitlab.zigexn.vn/nghiann/VeNJOB/merge_requests/21/diffs?diff_id=4722&start_sha=371050610e407f64760047655344768b3ce80494#8894a7593148633660686edad184d069ab39ae12_70_70)
      Toggle commit list
    Please register or sign in to reply
  • Van Hau Le
    @haulv started a discussion on the diff Mar 02, 2020
    Resolved by Nguyen Ngoc Nghia Mar 02, 2020
    app/services/crawl_data.rb 0 → 100644
    58 end
    59
    60 def get_industry(name)
    61 industry = Industry.find_or_create_by!(name: name)
    62 industry
    63 end
    64
    65 def get_city(name)
    66 name = name.strip
    67 City.find_or_create_by(name: name, region: "Việt Nam")
    68 end
    69
    70 def get_job(code = nil, title, salary, description, requirement, level, post_date, expiration_date, company_id)
    71 attrs = expiration_date.nil? ? {title: job_title, company_id: company_id} : {code: code}
    72 job = Job.find_or_initialize_by attrs
    73
    • Van Hau Le @haulv commented Mar 02, 2020
      Master

      @nghiann

      def save_job(job_attrs)
        attrs = expiration_date.nil? ? {title: job_title, company_id: company_id} : {code: code}
        job = Job.find_or_initialize_by attrs
        job.update_attributes(job_attrs)
      
        job
      end
      Edited Mar 02, 2020 by Van Hau Le
      @nghiann ``` ruby def save_job(job_attrs) attrs = expiration_date.nil? ? {title: job_title, company_id: company_id} : {code: code} job = Job.find_or_initialize_by attrs job.update_attributes(job_attrs) job end ```
    Please register or sign in to reply
  • Nguyen Ngoc Nghia @nghiann

    added 1 commit

    • ab38f5e4 - pass 1 param into save_job

    Compare with previous version

    Mar 02, 2020

    added 1 commit

    • ab38f5e4 - pass 1 param into save_job

    Compare with previous version

    added 1 commit * ab38f5e4 - pass 1 param into save_job [Compare with previous version](https://gitlab.zigexn.vn/nghiann/VeNJOB/merge_requests/21/diffs?diff_id=4722&start_sha=371050610e407f64760047655344768b3ce80494)
    Toggle commit list
  • Nguyen Ngoc Nghia @nghiann

    added 1 commit

    • 54e0368f - pass 1 param into save_job

    Compare with previous version

    Mar 02, 2020

    added 1 commit

    • 54e0368f - pass 1 param into save_job

    Compare with previous version

    added 1 commit * 54e0368f - pass 1 param into save_job [Compare with previous version](https://gitlab.zigexn.vn/nghiann/VeNJOB/merge_requests/21/diffs?diff_id=4723&start_sha=ab38f5e464fd6edd476570a992f97e5a2809859e)
    Toggle commit list
  • Nguyen Ngoc Nghia @nghiann

    added 1 commit

    • 872bce8d - pass 1 param into save_job

    Compare with previous version

    Mar 02, 2020

    added 1 commit

    • 872bce8d - pass 1 param into save_job

    Compare with previous version

    added 1 commit * 872bce8d - pass 1 param into save_job [Compare with previous version](https://gitlab.zigexn.vn/nghiann/VeNJOB/merge_requests/21/diffs?diff_id=4724&start_sha=54e0368fe9e4b097d2c785a054bb62ccb0054c95)
    Toggle commit list
  • Nguyen Ngoc Nghia @nghiann

    resolved all discussions

    Mar 02, 2020

    resolved all discussions

    resolved all discussions
    Toggle commit list
  • Van Hau Le @haulv

    mentioned in commit 2f65fef8

    Mar 02, 2020

    mentioned in commit 2f65fef8

    mentioned in commit 2f65fef8ef455cfa8d736ba8a0d7632dfb3463e5
    Toggle commit list
  • Van Hau Le @haulv

    merged

    Mar 02, 2020

    merged

    merged
    Toggle commit list
  • Write
  • Preview
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or sign in to comment
Nguyen Ngoc Nghia
Assignee
Nguyen Ngoc Nghia @nghiann
Assign to
None
Milestone
None
Assign milestone
Time tracking
2
2 participants
Reference: nghiann/VeNJOB!21
×

Revert this merge request

Switch branch
Cancel
A new branch will be created in your fork and a new merge request will be started.
×

Cherry-pick this merge request

Switch branch
Cancel
A new branch will be created in your fork and a new merge request will be started.