Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
V
VeNJOB
Overview
Overview
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Nguyen Ngoc Nghia
VeNJOB
Commits
2d0503f4
Commit
2d0503f4
authored
Feb 21, 2020
by
nnnghia98
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
remove unnecessary html tag
parent
0f713e18
Show whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
13 additions
and
12 deletions
+13
-12
app/services/crawl_data.rb
+13
-10
config/application.rb
+0
-2
No files found.
app/services/crawl_data.rb
View file @
2d0503f4
require
"nokogiri"
require
"open-uri"
require
"resolv-replace"
require
"openssl"
OpenSSL
::
SSL
::
VERIFY_PEER
=
OpenSSL
::
SSL
::
VERIFY_NONE
class
CrawlData
def
crawl_web
page
=
Nokogiri
::
HTML
.
parse
(
open
(
Settings
.
crawl
.
base_url
))
total_job
=
page
.
css
(
"div.ais-stats h1.col-sm-10 span"
).
text
.
gsub
(
","
,
""
).
to_f
total_page
=
(
total_job
/
50
).
floor
fixed_total_page
=
20
crawl_job_title_logger
=
ActiveSupport
::
Logger
.
new
(
"log/crawl_data.log"
)
crawl_job_title_logger
.
info
"Crawl at
#{
Time
.
current
}
"
(
1
..
fixed_total_page
).
each
do
|
each_page
|
page
=
Nokogiri
::
HTML
.
parse
(
open
(
URI
.
encode
(
"https://careerbuilder.vn/viec-lam/tat-ca-viec-lam-trang-
#{
each_page
}
-vi.html"
)))
(
0
..
49
).
each
do
|
j
|
job_url
=
page
.
css
(
"span.jobtitle h3 a @href"
)[
j
].
text
job_url
=
page
.
css
(
".jobtitle h3 a @href"
)[
j
].
text
job_page
=
Nokogiri
::
HTML
.
parse
(
open
(
URI
.
encode
(
job_url
)))
# Job code
job_code
=
job_url
.
split
(
"/"
).
last
.
split
(
"."
)[
-
2
]
next
if
job_page
.
css
(
"
div
.LeftJobCB"
).
nil?
next
if
job_page
.
css
(
".LeftJobCB"
).
nil?
# Job title
job_title
=
job_page
.
css
(
"
div
.top-job-info h1"
).
text
.
strip
job_title
=
job_page
.
css
(
".top-job-info h1"
).
text
.
strip
crawl_job_title_logger
=
ActiveSupport
::
Logger
.
new
(
"log/crawl_data.log"
)
crawl_job_title_logger
.
info
"
#{
job_title
}
"
# Job post date
job_post_date
=
job_page
.
css
(
"
div
.datepost span"
).
text
job_post_date
=
job_page
.
css
(
".datepost span"
).
text
job_salary
,
job_position
,
job_expiration_date
,
job_industries
,
job_level
=
""
job_workplace
=
[]
detail_job_new
=
job_page
.
css
(
"
ul
.DetailJobNew li p"
)
detail_job_new
=
job_page
.
css
(
".DetailJobNew li p"
)
(
0
..
detail_job_new
.
count
-
1
).
each
do
|
detail_part
|
detail
=
detail_job_new
[
detail_part
].
text
...
...
@@ -62,18 +65,18 @@ class CrawlData
company_name
,
company_email
,
company_address
,
company_desc
,
company_code
=
""
# Company full name
unless
job_page
.
css
(
"
div
.tit_company"
).
nil?
unless
job_page
.
css
(
".tit_company"
).
nil?
company_name
=
job_page
.
css
(
"div.tit_company"
).
text
.
strip
end
# Company code
company_code
=
job_url
.
split
(
"/"
).
last
.
split
(
"-"
).
last
.
split
(
"."
)[
-
2
].
strip
# Company address
unless
job_page
.
css
(
"
p
.TitleDetailNew label"
)[
0
].
nil?
unless
job_page
.
css
(
".TitleDetailNew label"
)[
0
].
nil?
company_address
=
job_page
.
css
(
"p.TitleDetailNew label"
)[
0
].
text
.
strip
end
# Company description
company_desc
=
job_page
.
css
(
"
span
#emp_more p"
).
text
.
strip
company_desc
=
job_page
.
css
(
"#emp_more p"
).
text
.
strip
job_workplace
.
each
do
|
city_name
|
city_id
=
city_id
(
city_name
)
...
...
config/application.rb
View file @
2d0503f4
require_relative
'boot'
require
'openssl'
require
'rails/all'
# Require the gems listed in Gemfile, including any gems
# you've limited to :test, :development, or :production.
OpenSSL
::
SSL
::
VERIFY_PEER
=
OpenSSL
::
SSL
::
VERIFY_NONE
Bundler
.
require
(
*
Rails
.
groups
)
module
Venjob
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment