Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
V
Venjob_HungNT
Overview
Overview
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Ngô Trung Hưng
Venjob_HungNT
Commits
564970e4
Commit
564970e4
authored
Jul 22, 2020
by
Ngô Trung Hưng
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
import_data_from_csv
parent
ad336f02
Hide whitespace changes
Inline
Side-by-side
Showing
10 changed files
with
143 additions
and
31 deletions
+143
-31
Gemfile
+1
-0
Gemfile.lock
+1
-0
config/database.yml
+1
-1
jobs.zip
+0
-0
lib/csv/jobs.csv
+0
-0
lib/src/crawler.rb
+10
-14
lib/src/ftp.rb
+107
-5
lib/src/interface_web.rb
+3
-7
lib/src/unzip.rb
+12
-0
lib/tasks/crawler.rake
+8
-4
No files found.
Gemfile
View file @
564970e4
...
...
@@ -21,6 +21,7 @@ gem 'turbolinks', '~> 5'
# Build JSON APIs with ease. Read more: https://github.com/rails/jbuilder
gem
'
jbuilder
'
,
'~> 2.5'
gem
'
nokogiri
'
gem
'
rubyzip
'
# Use Redis adapter to run Action Cable in production
# gem '
redis
', '~> 4.0'
# Use ActiveModel has_secure_password
...
...
Gemfile.lock
View file @
564970e4
...
...
@@ -239,6 +239,7 @@ DEPENDENCIES
rails (~> 5.2.4, >= 5.2.4.3)
rails_12factor
rubocop
rubyzip
sass-rails (~> 5.0)
selenium-webdriver
spring
...
...
config/database.yml
View file @
564970e4
...
...
@@ -14,7 +14,7 @@ default: &default
encoding
:
utf8
pool
:
<%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>
username
:
root
password
:
'
1
2345678
'
password
:
'
1'
socket
:
/var/run/mysqld/mysqld.sock
...
...
jobs.zip
deleted
100644 → 0
View file @
ad336f02
File deleted
jobs.csv
→
lib/csv/
jobs.csv
View file @
564970e4
File moved
lib/src/crawler.rb
View file @
564970e4
...
...
@@ -13,10 +13,12 @@ class Clawler
data_list_cities
<<
x
.
gsub
(
/(^<[\w\D]*>)/
,
''
).
gsub
(
/\n/
,
''
).
rstrip
end
puts
"Save data to database...
\n
------------------------"
data_list_cities
.
length
.
times
do
|
i
|
area
=
i
>
69
?
0
:
1
name
=
(
data_list_cities
[
i
].
to_s
)
City
.
create!
(
name:
name
,
area:
area
)
data_list_cities
.
each_with_index
do
|
val
,
index
|
area
=
index
>
69
?
0
:
1
City
.
find_or_create_by
(
name:
val
)
do
|
city
|
city
.
name
=
val
city
.
area
=
area
end
end
end
...
...
@@ -30,21 +32,15 @@ class Clawler
data_list_industries
<<
x
.
gsub
(
/(^<[\w\D]*>)/
,
''
).
gsub
(
/\n/
,
''
).
strip
end
puts
"Save data to database...
\n
------------------------"
data_list_industries
.
length
.
times
do
|
i
|
name
=
data_list_industries
[
i
].
to_s
if
name
.
include?
(
'&'
)
name
.
gsub!
(
'&'
,
'&'
)
end
Industry
.
create!
(
name:
name
)
data_list_industries
.
each
do
|
val
|
val
.
gsub!
(
'&'
,
'&'
)
if
val
.
include?
(
'&'
)
Industry
.
find_or_create_by
(
name:
val
)
{
|
industry
|
industry
.
name
=
val
}
end
end
# FILL DATA COMPANIES
def
self
.
make_companies
# Company.create!(name: "Bảo mật",
# address: "Vui lòng xem trong mô tả công việc",
# short_description: "Vui lòng xem trong mô tả công việc")
Company
.
find_or_create_by
(
name:
'Bảo mật'
,
address:
'Vui lòng xem trong mô tả công việc'
)
do
|
company
|
company
.
name
=
'Bảo mật'
company
.
address
=
'Vui lòng xem trong mô tả công việc'
...
...
@@ -62,6 +58,7 @@ class Clawler
end
end
end
# FILL DATA JOBS
def
self
.
make_jobs
Job
.
update_all
(
newdata:
0
)
...
...
@@ -130,5 +127,4 @@ class Clawler
city_id:
id_cities
)
end
end
end
lib/src/ftp.rb
View file @
564970e4
require
'net/ftp'
require
'src/unzip'
require
'csv'
class
FTP_sever
CONTENT_SERVER_DOMAIN_NAME
=
'192.168.1.156'
CONTENT_SERVER_USER_NAME
=
'training'
CONTENT_SERVER_USER_PASSWORD
=
'training'
def
self
.
donwload_csv
Net
::
FTP
.
open
(
CONTENT_SERVER_DOMAIN_NAME
,
CONTENT_SERVER_USER_NAME
,
CONTENT_SERVER_USER_PASSWORD
)
do
|
ftp
|
debugger
@file
=
ftp
.
getbinaryfile
(
'jobs.zip'
)
@file
.
save!
puts
"
#{
Time
.
now
}
<< 'Donwload jobs.zip'"
Net
::
FTP
.
open
(
CONTENT_SERVER_DOMAIN_NAME
,
CONTENT_SERVER_USER_NAME
,
CONTENT_SERVER_USER_PASSWORD
)
do
|
ftp
|
ftp
.
getbinaryfile
(
'jobs.zip'
)
begin
extract_zip
(
'./jobs.zip'
,
'lib/csv'
)
File
.
delete
(
'./jobs.zip'
)
if
File
.
exist?
(
'./jobs.zip'
)
puts
"Unzip done
\n
"
rescue
puts
"File not found
\n
"
end
end
end
def
self
.
data_csv
donwload_csv
()
table
=
CSV
.
parse
(
File
.
read
(
"lib/csv/jobs.csv"
),
headers:
true
)
end
# puts table['name']
# puts table['company name'].size
# puts table['company province'].size
##puts table['category'].size
# puts table['company address'].size
# puts table['level'].size
# puts table['salary'].size
# puts table['benefit'].size
# puts table['requirement'].size
# puts table['description'].size
def
self
.
parse_csv_industries
(
data
)
puts
'Import data industries . . .'
industries
=
[]
data
[
'category'
].
each
do
|
val
|
industries
<<
val
.
strip
end
industries
.
each
do
|
val
|
val
.
gsub!
(
','
,
'/'
)
if
val
.
include?
(
','
)
val
.
gsub!
(
'/'
,
' / '
)
Industry
.
find_or_create_by
(
name:
val
)
{
|
industry
|
industry
.
name
=
val
}
end
puts
'Done parse csv industries'
end
def
self
.
parse_csv_cities
(
data
)
puts
'Import data cities . . .'
arr_city
=
''
cities
=
data
[
'work place'
].
select
{
|
val
|
val
.
present?
}
cities
.
uniq!
arr_city
=
cities
.
map
{
|
val
|
val
.
delete
(
"[]
\"
"
)
}
arr_city
.
each
do
|
val
|
if
!
val
.
blank?
City
.
find_or_create_by
(
name:
val
)
do
|
city
|
city
.
name
=
val
city
.
area
=
1
end
end
end
end
def
self
.
parse_csv_companies
(
data
)
puts
'Import data companies . . .'
data
[
'company name'
].
each_with_index
do
|
name
,
index
|
begin
Company
.
find_or_create_by
(
name:
name
.
strip
)
do
|
company
|
company
.
name
=
name
.
strip
company
.
address
=
data
[
'company address'
][
index
]
company
.
short_description
=
data
[
'benefit'
][
index
]
end
rescue
=>
exception
puts
'---'
end
end
puts
'Done import data companies'
end
def
self
.
parse_csv_jobs
(
data
)
Job
.
update_all
(
newdata:
0
)
data
[
'name'
].
each_with_index
do
|
name
,
index
|
desc
=
data
[
'requirement'
][
index
]
<<
'\n'
<<
data
[
'description'
][
index
]
id_company
=
Company
.
find_by
name:
data
[
'company name'
][
index
].
to_s
.
strip
if
id_company
.
blank?
id_company
=
1
else
id_company
=
id_company
.
id
end
id_job
=
Job
.
create!
(
name:
name
,
company_id:
id_company
,
level:
data
[
'level'
][
index
],
experience:
""
,
salary:
data
[
'salary'
][
index
],
create_date:
Time
.
now
,
expiration_date:
""
,
description:
desc
,
newdata:
1
)
end
end
def
self
.
import_data_from_csv
data
=
data_csv
()
parse_csv_industries
(
data
)
parse_csv_cities
(
data
)
parse_csv_companies
(
data
)
# parse_csv_jobs(data)
end
end
\ No newline at end of file
lib/src/interface_web.rb
View file @
564970e4
...
...
@@ -23,9 +23,9 @@ class Interface_web
data
<<
website_companies
<<
website_jobs
end
@crawl_link_for_companies_jobs
=
crawl_link_for_companies_jobs
(
15
)
# @crawl_link_for_companies_jobs = crawl_link_for_companies_jobs(3
)
def
self
.
get_link_job_and_companies
@crawl_link_for_companies_jobs
||=
crawl_link_for_companies_jobs
(
1
5
)
@crawl_link_for_companies_jobs
||=
crawl_link_for_companies_jobs
(
1
)
end
def
self
.
base_link
(
url
)
...
...
@@ -172,8 +172,7 @@ class Interface_web
add_data
(
@name
,
@company_name
,
@city_name
,
@created_date
,
@expiration_date
,
@salary
,
@industry_name
,
@description
,
@level
,
@exprience
)
end
def
self
.
crawl_data_jobs_interface_5
(
page
)
# page = base_link(url)
def
self
.
crawl_data_jobs_interface_5
(
page
)
@name
<<
page
.
search
(
".info-company h1"
).
text
@company_name
<<
page
.
search
(
".info-company .text-job h2"
).
text
...
...
@@ -225,9 +224,6 @@ class Interface_web
end
@data
end
end
...
...
lib/src/unzip.rb
0 → 100644
View file @
564970e4
require
'zip'
def
extract_zip
(
file
,
destination
)
FileUtils
.
mkdir_p
(
destination
)
Zip
::
File
.
open
(
file
)
do
|
zip_file
|
zip_file
.
each
do
|
f
|
fpath
=
File
.
join
(
destination
,
f
.
name
)
zip_file
.
extract
(
f
,
fpath
)
unless
File
.
exist?
(
fpath
)
end
end
end
\ No newline at end of file
lib/tasks/crawler.rake
View file @
564970e4
require
'src/crawler'
require
'src/ftp'
namespace
:db
do
task
populate: :environment
do
Clawler
.
make_industries
Clawler
.
make_cities
Clawler
.
make_companies
Clawler
.
make_jobs
# Clawler.make_industries
# Clawler.make_cities
# Clawler.make_companies
# Clawler.make_jobs
end
task
csv: :environment
do
FTP_sever
.
import_data_from_csv
end
end
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment