Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
V
Venjob_HungNT
Overview
Overview
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Ngô Trung Hưng
Venjob_HungNT
Commits
81cfe475
Commit
81cfe475
authored
Jul 21, 2020
by
Ngo Trung Hung
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
fix
parent
9387da00
Show whitespace changes
Inline
Side-by-side
Showing
4 changed files
with
24 additions
and
18 deletions
+24
-18
config/database.yml
+1
-1
lib/src/crawler.rb
+15
-13
lib/src/interface_web.rb
+6
-2
lib/tasks/crawler.rake
+2
-2
No files found.
config/database.yml
View file @
81cfe475
...
@@ -14,7 +14,7 @@ default: &default
...
@@ -14,7 +14,7 @@ default: &default
encoding
:
utf8
encoding
:
utf8
pool
:
<%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>
pool
:
<%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>
username
:
root
username
:
root
password
:
'
1'
password
:
'
1
2345678
'
socket
:
/var/run/mysqld/mysqld.sock
socket
:
/var/run/mysqld/mysqld.sock
...
...
lib/src/crawler.rb
View file @
81cfe475
require
'open-uri'
require
'open-uri'
require
'src/interface_web'
require
'src/interface_web'
class
Clawler
class
Clawler
@page
=
Nokogiri
::
HTML
(
URI
.
open
(
'https://careerbuilder.vn/viec-lam/tat-ca-viec-lam-vi.html'
))
@
@
page
=
Nokogiri
::
HTML
(
URI
.
open
(
'https://careerbuilder.vn/viec-lam/tat-ca-viec-lam-vi.html'
))
# PILL DATA CITIES
# PILL DATA CITIES
def
self
.
make_cities
def
self
.
make_cities
@data_list_cities
=
[]
puts
"Crawling data location...
\n
.
\n
.
\n
."
data
=
@page
.
search
(
"#location option"
)
data_list_cities
=
[]
data
=
@@page
.
search
(
"#location option"
)
list_cities
=
data
.
to_s
.
split
(
"</option>"
)
list_cities
=
data
.
to_s
.
split
(
"</option>"
)
list_cities
.
each
do
|
x
|
list_cities
.
each
do
|
x
|
@
data_list_cities
<<
x
.
gsub
(
/(^<[\w\D]*>)/
,
''
).
gsub
(
/\n/
,
''
).
rstrip
data_list_cities
<<
x
.
gsub
(
/(^<[\w\D]*>)/
,
''
).
gsub
(
/\n/
,
''
).
rstrip
end
end
puts
"Save data to database...
\n
"
@
data_list_cities
.
length
.
times
do
|
i
|
data_list_cities
.
length
.
times
do
|
i
|
area
=
i
>
69
?
0
:
1
area
=
i
>
69
?
0
:
1
name
=
(
@
data_list_cities
[
i
].
to_s
)
name
=
(
data_list_cities
[
i
].
to_s
)
City
.
create!
(
name:
name
,
area:
area
)
City
.
create!
(
name:
name
,
area:
area
)
end
end
end
end
#PIL DATA INDUSTRIES
#PIL DATA INDUSTRIES
def
self
.
make_industries
def
self
.
make_industries
@data_list_industries
=
[]
puts
"Crawling data industries...
\n
.
\n
.
\n
."
data
=
@page
.
search
(
"#industry option"
)
data_list_industries
=
[]
data
=
@@page
.
search
(
"#industry option"
)
list_industries
=
data
.
to_s
.
split
(
"</option>"
)
list_industries
=
data
.
to_s
.
split
(
"</option>"
)
list_industries
.
each
do
|
x
|
list_industries
.
each
do
|
x
|
@
data_list_industries
<<
x
.
gsub
(
/(^<[\w\D]*>)/
,
''
).
gsub
(
/\n/
,
''
).
strip
data_list_industries
<<
x
.
gsub
(
/(^<[\w\D]*>)/
,
''
).
gsub
(
/\n/
,
''
).
strip
end
end
puts
"Save data to database...
\n
"
@
data_list_industries
.
length
.
times
do
|
i
|
data_list_industries
.
length
.
times
do
|
i
|
name
=
@
data_list_industries
[
i
].
to_s
name
=
data_list_industries
[
i
].
to_s
if
name
.
include?
(
'&'
)
if
name
.
include?
(
'&'
)
name
.
gsub!
(
'&'
,
'&'
)
name
.
gsub!
(
'&'
,
'&'
)
end
end
...
...
lib/src/interface_web.rb
View file @
81cfe475
class
Interface_web
class
Interface_web
# func get "n" link company & job
# func get "n" link company & job
debugger
def
self
.
crawl_link_for_companies_jobs
(
page
)
def
self
.
crawl_link_for_companies_jobs
(
page
)
puts
"Crawling link on page...
\n
PLease wait...
\n
"
data
=
[]
data
=
[]
website_companies
=
[]
website_companies
=
[]
website_jobs
=
[]
website_jobs
=
[]
...
@@ -18,12 +20,14 @@ class Interface_web
...
@@ -18,12 +20,14 @@ class Interface_web
website_jobs
=
website_jobs
.
join
(
","
)
website_jobs
=
website_jobs
.
join
(
","
)
website_jobs
=
website_jobs
.
split
(
","
)
website_jobs
=
website_jobs
.
split
(
","
)
website_jobs
=
website_jobs
.
select
{
|
val
|
val
!=
''
}
website_jobs
=
website_jobs
.
select
{
|
val
|
val
!=
''
}
puts
"Result:
\n
Company:
#{
website_companies
.
length
}
link
\n
Job :
#{
website_jobs
}
link"
data
<<
website_companies
<<
website_jobs
data
<<
website_companies
<<
website_jobs
end
end
@crawl_link_for_companies_jobs
=
crawl_link_for_companies_jobs
(
5
)
@crawl_link_for_companies_jobs
=
crawl_link_for_companies_jobs
(
1
)
def
self
.
get_link_job_and_companies
def
self
.
get_link_job_and_companies
@crawl_link_for_companies_jobs
||=
crawl_link_for_companies_jobs
(
5
)
@crawl_link_for_companies_jobs
||=
crawl_link_for_companies_jobs
(
1
)
end
end
def
self
.
base_link
(
url
)
def
self
.
base_link
(
url
)
...
...
lib/tasks/crawler.rake
View file @
81cfe475
...
@@ -2,8 +2,8 @@ require 'src/crawler'
...
@@ -2,8 +2,8 @@ require 'src/crawler'
namespace
:db
do
namespace
:db
do
task
populate: :environment
do
task
populate: :environment
do
# Clawler.make_industries
# Clawler.make_industries
Clawler
.
make_cities
#
Clawler.make_cities
Clawler
.
make_companies
Clawler
.
make_companies
Clawler
.
make_jobs
#
Clawler.make_jobs
end
end
end
end
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment