Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
V
venjob_nth
Overview
Overview
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
3
Merge Requests
3
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Ngô Trung Hưng
venjob_nth
Commits
716b0bd9
Commit
716b0bd9
authored
Jul 28, 2020
by
Ngô Trung Hưng
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
fix -part 3
parent
814a4af9
Pipeline
#722
canceled with stages
in 0 seconds
Changes
2
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
58 additions
and
70 deletions
+58
-70
lib/src/interface_web.rb
+58
-69
lib/tasks/crawler.rake
+0
-1
No files found.
lib/src/interface_web.rb
View file @
716b0bd9
...
@@ -5,6 +5,7 @@ require 'open-uri'
...
@@ -5,6 +5,7 @@ require 'open-uri'
# Crawler data
# Crawler data
class
InterfaceWeb
class
InterfaceWeb
COMPANY_SECURITY
=
1
COMPANY_SECURITY
=
1
NUMBER_LINK
=
1
SIZE_LI_INTERFACE_5
=
10
SIZE_LI_INTERFACE_5
=
10
INTERNATIONAL
=
0
INTERNATIONAL
=
0
DOMESTIC
=
1
DOMESTIC
=
1
...
@@ -32,10 +33,9 @@ class InterfaceWeb
...
@@ -32,10 +33,9 @@ class InterfaceWeb
File
.
write
(
'tmp/link.txt'
,
website_jobs
[
0
])
File
.
write
(
'tmp/link.txt'
,
website_jobs
[
0
])
data
<<
website_companies
<<
website_jobs
data
<<
website_companies
<<
website_jobs
end
end
def
link_job_and_companies
def
link_job_and_companies
@link_job_and_companies
||=
crawl_link
(
2
)
@link_job_and_companies
||=
crawl_link
(
NUMBER_LINK
)
end
end
def
self
.
safe_link
(
url
)
def
self
.
safe_link
(
url
)
...
@@ -55,7 +55,6 @@ class InterfaceWeb
...
@@ -55,7 +55,6 @@ class InterfaceWeb
data_list_cities
.
each_with_index
do
|
val
,
index
|
data_list_cities
.
each_with_index
do
|
val
,
index
|
area
=
index
>
RANGE
?
INTERNATIONAL
:
DOMESTIC
area
=
index
>
RANGE
?
INTERNATIONAL
:
DOMESTIC
City
.
find_or_create_by
(
name:
val
)
do
|
city
|
City
.
find_or_create_by
(
name:
val
)
do
|
city
|
city
.
name
=
val
city
.
area
=
area
city
.
area
=
area
end
end
end
end
...
@@ -82,7 +81,6 @@ class InterfaceWeb
...
@@ -82,7 +81,6 @@ class InterfaceWeb
begin
begin
if
name
.
present?
&&
address
.
present?
&&
desc
.
present?
if
name
.
present?
&&
address
.
present?
&&
desc
.
present?
Company
.
find_or_create_by
(
name:
name
.
strip
)
do
|
company
|
Company
.
find_or_create_by
(
name:
name
.
strip
)
do
|
company
|
company
.
name
=
name
.
strip
company
.
address
=
address
company
.
address
=
address
company
.
short_description
=
desc
company
.
short_description
=
desc
end
end
...
@@ -96,100 +94,94 @@ class InterfaceWeb
...
@@ -96,100 +94,94 @@ class InterfaceWeb
private
private
def
add_data
(
name
,
company_name
,
city_name
,
created_date
,
expiration_date
,
salary
,
industry_name
,
description
,
level
,
exprience
)
def
add_data
(
data
)
id_company
=
Company
.
find_by
name:
company_name
id_company
=
Company
.
find_by
name:
data
[
:company_name
]
id_company
=
id_company
.
present?
?
id_company
.
id
:
COMPANY_SECURITY
id_company
=
id_company
.
present?
?
id_company
.
id
:
COMPANY_SECURITY
id_job
=
Job
.
create!
(
name:
name
,
id_job
=
Job
.
create!
(
name:
data
[
:name
]
,
company_id:
id_company
,
company_id:
id_company
,
level:
level
,
level:
data
[
:level
]
,
experience:
exprience
,
experience:
data
[
:exprience
]
,
salary:
salary
,
salary:
data
[
:salary
]
,
create_date:
created_date
,
create_date:
data
[
:created_date
]
,
expiration_date:
expiration_date
,
expiration_date:
data
[
:expiration_date
]
,
description:
d
escription
)
description:
d
ata
[
:description
]
)
make_foreign_industries_table
(
industry_name
,
id_job
.
id
)
make_foreign_industries_table
(
data
[
:industry_name
]
,
id_job
.
id
)
make_foreign_cities_table
(
city_name
,
id_job
.
id
)
make_foreign_cities_table
(
data
[
:city_name
]
,
id_job
.
id
)
rescue
StandardError
=>
e
rescue
StandardError
=>
e
puts
e
puts
e
end
end
private
def
crawl_data_jobs_interface_1
(
page
)
def
crawl_data_jobs_interface_1
(
page
)
name
=
page
.
search
(
'.apply-now-content .job-desc .title'
).
text
data
=
{}
company_name
=
page
.
search
(
'.apply-now-content .job-desc .job-company-name'
).
text
data
[
:name
]
=
page
.
search
(
'.apply-now-content .job-desc .title'
).
text
data
[
:company_name
]
=
page
.
search
(
'.apply-now-content .job-desc .job-company-name'
).
text
location
=
[]
location
=
[]
length
=
page
.
search
(
'.detail-box .map p a'
).
size
length
=
page
.
search
(
'.detail-box .map p a'
).
size
length
.
times
do
|
n
|
length
.
times
do
|
n
|
location
<<
page
.
search
(
".detail-box .map p a:nth-child(
#{
n
+
1
}
)"
).
text
location
<<
page
.
search
(
".detail-box .map p a:nth-child(
#{
n
+
1
}
)"
).
text
end
end
city_name
=
location
.
join
(
','
)
data
[
:city_name
]
=
location
.
join
(
','
)
created_date
=
page
.
search
(
'.item-blue .detail-box:nth-child(1) ul li:nth-child(1) p'
)[
0
].
text
data
[
:created_date
]
=
page
.
search
(
'.item-blue .detail-box:nth-child(1) ul li:nth-child(1) p'
)[
0
].
text
expiration_date
=
page
.
search
(
'.item-blue .detail-box ul li:last'
)[
1
].
text
.
delete!
(
"[
\n
,
\t
,
\r
]"
).
split
(
' '
).
last
data
[
:expiration_date
]
=
page
.
search
(
'.item-blue .detail-box ul li:last'
)[
1
].
text
.
delete!
(
"[
\n
,
\t
,
\r
]"
).
split
(
' '
).
last
salary
=
page
.
search
(
'.item-blue .detail-box:nth-child(1) ul li:nth-child(1) p'
)[
1
].
text
data
[
:salary
]
=
page
.
search
(
'.item-blue .detail-box:nth-child(1) ul li:nth-child(1) p'
)[
1
].
text
industries
=
page
.
search
(
'.item-blue .detail-box:nth-child(1) ul li:nth-child(2) a'
).
text
industries
=
page
.
search
(
'.item-blue .detail-box:nth-child(1) ul li:nth-child(2) a'
).
text
industries
=
industries
.
delete!
(
"[
\n
,
\t
,
\r
]"
).
split
(
' '
).
select
(
&
:present?
)
industries
=
industries
.
delete!
(
"[
\n
,
\t
,
\r
]"
).
split
(
' '
).
select
(
&
:present?
)
industry_name
=
industries
.
join
(
','
)
data
[
:industry_name
]
=
industries
.
join
(
','
)
d
escription
=
page
.
search
(
'.tabs .tab-content .detail-row:nth-child(n)'
).
to_s
d
ata
[
:description
]
=
page
.
search
(
'.tabs .tab-content .detail-row:nth-child(n)'
).
to_s
get_level
=
page
.
search
(
'.item-blue .detail-box:last ul li:nth-child(3)'
).
text
.
delete!
(
"[
\n
,
\t
,
\r
]"
).
lstrip
.
split
(
'Cấp bậc'
)
get_level
=
page
.
search
(
'.item-blue .detail-box:last ul li:nth-child(3)'
).
text
.
delete!
(
"[
\n
,
\t
,
\r
]"
).
lstrip
.
split
(
'Cấp bậc'
)
get_level
=
get_level
[
1
].
to_s
.
strip
get_level
=
get_level
[
1
].
to_s
.
strip
if
get_level
.
blank?
if
get_level
.
blank?
g_level
=
page
.
search
(
'.item-blue .detail-box:last ul li:nth-child(2)'
).
text
.
delete!
(
"[
\n
,
\t
,
\r
]"
).
lstrip
.
split
(
'Cấp bậc'
)
g_level
=
page
.
search
(
'.item-blue .detail-box:last ul li:nth-child(2)'
).
text
.
delete!
(
"[
\n
,
\t
,
\r
]"
).
lstrip
.
split
(
'Cấp bậc'
)
level
=
g_level
[
1
].
to_s
.
strip
data
[
:level
]
=
g_level
[
1
].
to_s
.
strip
else
else
g_level
=
get_level
data
[
:level
]
=
get_level
level
=
g_level
end
end
exp
=
page
.
search
(
'.item-blue .detail-box:last ul li:nth-child(2)'
).
text
.
delete!
(
"[
\n
,
\t
,
\r
]"
).
split
(
'Kinh nghiệm'
)
exp
=
page
.
search
(
'.item-blue .detail-box:last ul li:nth-child(2)'
).
text
.
delete!
(
"[
\n
,
\t
,
\r
]"
).
split
(
'Kinh nghiệm'
)
exp
=
exp
[
1
].
to_s
.
strip
exp
=
exp
[
1
].
to_s
.
strip
exprience
=
exp
data
[
:exprience
]
=
exp
add_data
(
name
,
company_name
,
city_name
,
created_date
,
expiration_date
,
salary
,
industry_name
,
description
,
level
,
exprience
)
add_data
(
data
)
end
end
private
def
crawl_data_jobs_interface_2
(
page
)
def
crawl_data_jobs_interface_2
(
page
)
name
=
page
.
search
(
'.apply-now-content .job-desc .title'
).
text
data
=
{}
company_name
=
page
.
search
(
'.top-job .top-job-info .tit_company'
).
text
data
[
:name
]
=
page
.
search
(
'.apply-now-content .job-desc .title'
).
text
data
[
:company_name
]
=
page
.
search
(
'.top-job .top-job-info .tit_company'
).
text
locations
=
[]
locations
=
[]
length
=
page
.
search
(
'.info-workplace .value a'
).
size
length
=
page
.
search
(
'.info-workplace .value a'
).
size
length
.
times
do
|
n
|
length
.
times
do
|
n
|
locations
<<
page
.
search
(
".info-workplace .value a:nth-child(
#{
n
+
1
}
)"
).
text
locations
<<
page
.
search
(
".info-workplace .value a:nth-child(
#{
n
+
1
}
)"
).
text
end
end
city_name
=
locations
.
join
(
','
)
data
[
:city_name
]
=
locations
.
join
(
','
)
created_date
=
''
data
[
:created_date
]
=
''
expiration_date
=
page
.
search
(
'.info li:nth-child(4)'
).
text
expiration_date
=
page
.
search
(
'.info li:nth-child(4)'
).
text
expiration_date
=
expiration_date
.
blank?
?
''
:
expiration_date
.
delete!
(
"[
\n
,
\t
,
\r
]"
).
split
(
' '
).
last
data
[
:expiration_date
]
=
expiration_date
.
blank?
?
''
:
expiration_date
.
delete!
(
"[
\n
,
\t
,
\r
]"
).
split
(
' '
).
last
salary
=
page
.
search
(
'.info li:nth-child(3)'
).
text
.
split
(
'Lương'
).
last
.
strip
data
[
:salary
]
=
page
.
search
(
'.info li:nth-child(3)'
).
text
.
split
(
'Lương'
).
last
.
strip
industry_name
=
page
.
search
(
'.info li:nth-child(5) .value'
).
text
data
[
:industry_name
]
=
page
.
search
(
'.info li:nth-child(5) .value'
).
text
d
escription
=
page
.
search
(
'.left-col'
).
to_s
d
ata
[
:description
]
=
page
.
search
(
'.left-col'
).
to_s
lv
=
page
.
search
(
'.boxtp .info li:nth-child(2)'
).
text
lv
=
page
.
search
(
'.boxtp .info li:nth-child(2)'
).
text
level
=
lv
.
blank?
?
''
:
lv
.
delete!
(
"[
\n
,
\t
,
\r
]"
).
strip
.
split
(
'Cấp bậc'
).
last
.
strip
data
[
:level
]
=
lv
.
blank?
?
''
:
lv
.
delete!
(
"[
\n
,
\t
,
\r
]"
).
strip
.
split
(
'Cấp bậc'
).
last
.
strip
exp
=
page
.
search
(
'.info li:nth-child(6)'
).
text
exp
=
page
.
search
(
'.info li:nth-child(6)'
).
text
exprience
=
exp
.
blank?
?
''
:
exp
.
delete!
(
"[
\n
,
\t
,
\r
]"
).
split
(
'Kinh nghiệm'
).
last
.
strip
data
[
:exprience
]
=
exp
.
blank?
?
''
:
exp
.
delete!
(
"[
\n
,
\t
,
\r
]"
).
split
(
'Kinh nghiệm'
).
last
.
strip
add_data
(
name
,
company_name
,
city_name
,
created_date
,
expiration_date
,
salary
,
industry_name
,
description
,
level
,
exprience
)
add_data
(
data
)
end
end
private
def
crawl_data_jobs_interface_5
(
page
)
def
crawl_data_jobs_interface_5
(
page
)
name
=
page
.
search
(
'.info-company h1'
).
text
data
=
{}
company_name
=
page
.
search
(
'.info-company .text-job h2'
).
text
data
[
:name
]
=
page
.
search
(
'.info-company h1'
).
text
city_name
=
page
.
search
(
'.DetailJobNew ul li:nth-child(1) a'
).
text
data
[
:company_name
]
=
page
.
search
(
'.info-company .text-job h2'
).
text
created_date
=
''
data
[
:city_name
]
=
page
.
search
(
'.DetailJobNew ul li:nth-child(1) a'
).
text
expiration_date
=
page
.
search
(
'.DetailJobNew li:nth-child(9) span'
).
text
.
strip
data
[
:created_date
]
=
''
salary
=
page
.
search
(
'.DetailJobNew li:nth-child(3) span'
).
text
.
strip
data
[
:expiration_date
]
=
page
.
search
(
'.DetailJobNew li:nth-child(9) span'
).
text
.
strip
industry_name
=
page
.
search
(
'.DetailJobNew li:nth-child(2) span'
).
text
.
strip
data
[
:salary
]
=
page
.
search
(
'.DetailJobNew li:nth-child(3) span'
).
text
.
strip
description
=
page
.
search
(
'.left-col .detail-row'
)
data
[
:industry_name
]
=
page
.
search
(
'.DetailJobNew li:nth-child(2) span'
).
text
.
strip
level
=
page
.
search
(
'.DetailJobNew ul li:nth-child(6) span'
).
text
.
strip
data
[
:description
]
=
page
.
search
(
'.left-col .detail-row'
)
exprience
=
page
.
search
(
'.DetailJobNew li:nth-child(5) span'
).
text
.
strip
data
[
:level
]
=
page
.
search
(
'.DetailJobNew ul li:nth-child(6) span'
).
text
.
strip
add_data
(
name
,
company_name
,
city_name
,
created_date
,
expiration_date
,
salary
,
industry_name
,
description
,
level
,
exprience
)
data
[
:exprience
]
=
page
.
search
(
'.DetailJobNew li:nth-child(5) span'
).
text
.
strip
add_data
(
data
)
end
end
private
def
make_foreign_industries_table
(
data
,
id_job
)
def
make_foreign_industries_table
(
data
,
id_job
)
unless
data
.
blank?
&&
id_job
.
blank?
unless
data
.
blank?
&&
id_job
.
blank?
content
=
data
.
split
(
','
)
content
=
data
.
split
(
','
)
content
.
each
do
|
val
|
content
.
each
do
|
val
|
val
.
gsub!
(
'&'
,
'&'
)
if
val
.
include?
(
'&'
)
val
.
gsub!
(
'&'
,
'&'
)
if
val
.
include?
(
'&'
)
...
@@ -200,16 +192,13 @@ class InterfaceWeb
...
@@ -200,16 +192,13 @@ class InterfaceWeb
end
end
end
end
private
def
make_foreign_cities_table
(
data
,
id_job
)
def
make_foreign_cities_table
(
data
,
id_job
)
unless
data
.
blank?
&&
id_job
.
blank?
return
if
data
.
blank?
&&
id_job
.
blank?
cities
=
data
.
split
(
','
)
cities
=
data
.
split
(
','
)
cities
.
each
do
|
city
|
cities
.
each
do
|
city
|
data_city
=
City
.
find_by
name:
city
.
strip
data_city
=
City
.
find_by
name:
city
.
strip
id_cities
=
data_city
.
blank?
?
City
.
create!
(
name:
city
.
strip
,
area:
DOMESTIC
).
id
:
data_city
.
id
id_cities
=
data_city
.
blank?
?
City
.
create!
(
name:
city
.
strip
,
area:
DOMESTIC
).
id
:
data_city
.
id
CityJob
.
create!
(
job_id:
id_job
,
city_id:
id_cities
)
CityJob
.
create!
(
job_id:
id_job
,
city_id:
id_cities
)
end
end
end
end
end
...
...
lib/tasks/crawler.rake
View file @
716b0bd9
...
@@ -7,7 +7,6 @@ require 'src/interface_web'
...
@@ -7,7 +7,6 @@ require 'src/interface_web'
namespace
:crawler
do
namespace
:crawler
do
task
populate: :environment
do
task
populate: :environment
do
Company
.
find_or_create_by
(
name:
'Bảo mật'
)
do
|
company
|
Company
.
find_or_create_by
(
name:
'Bảo mật'
)
do
|
company
|
company
.
name
=
'Bảo mật'
company
.
address
=
'Vui lòng xem trong mô tả công việc'
company
.
address
=
'Vui lòng xem trong mô tả công việc'
company
.
short_description
=
'Vui lòng xem trong mô tả công việc'
company
.
short_description
=
'Vui lòng xem trong mô tả công việc'
end
end
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment