스쿱-MySQL 연동 2. 테스트

돌크리트 2019. 5. 9. 17:58

2019. 5. 9. 17:58

mysql student 테이블 정보

1
2
3
4
5
6
7
8
9

mysql> desc student;
+-------+-------------+------+-----+---------+-------+
| Field | Type        | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| id    | int(11)     | NO   | PRI | NULL    |       |
| name  | varchar(10) | YES  |     | NULL    |       |
| phone | varchar(20) | YES  |     | NULL    |       |
+-------+-------------+------+-----+---------+-------+
3 rows in set (0.00 sec)

 

1
2
3
4
5
6
7
8

mysql> select * from student;
+----+-------+---------------+
| id | name  | phone         |
+----+-------+---------------+
|  1 | smith | 010-1111-1111 |
|  2 | kelly | 010-2222-2222 |
+----+-------+---------------+
2 rows in set (0.00 sec)

 

1
2

[hadoop@master ~]$ cd /usr/local/sqoop/
[hadoop@master sqoop]$ vi student_import.sh

위 처럼 student_import.sh 파일을 만들고 다음과 같이 작성 하였다.

1
2
3
4
5
6
7
8
9
10
11
12
13

import
--username
hadoop
--password
'비밀번호'
--connect
jdbc:mysql://xxx.xxx.xxx.35:3309/member?serverTimezone=Asia/Seoul
--table
student
--split-by
ID
--m
1
 

 

참고로 작성 방법은 다음과 같다.

--connect jdbc:mysql://마스터주소:mysql포트번호/mysql데이터베이스명?serverTimezone=Asia/Seoul \ =JDBC 연결, mysql 데이터베이스, mysql 타임존 명시
--username [mysql 사용자명] \ =mysql 인증 사용자명 명시
--password [mysql 비밀번호] \ =mysql 인증 비밀번호 명시
--query 'select * from sample_table WHERE $CONDITIONS' \ =mysql 쿼리 명시
--split-by [컬럼명] \ =분할 단위로 쓰이는 테이블의 컬럼
--target-dir [디렉토리] 예시) hdfs://localhost/user/hadoop/ =목적지 HDFS 디렉토리(디렉토리를 작성하지 않으면 디폴트 값인 root경로로 바로 저장 된다.)
-m 1

1
2
3
4
5
6
7
8
9
10
11
12

 import-all-tables
 --username
 mysql계정명 입력
 --password
 비밀번호 입력
 --connect
 jdbc:mysql://192.168.0.35:3309/디비명?serverTimezone=Asia/Seoul
 --warehouse-dir
 /user/hadoop/EdgeDB  <= hdfs 저장 
 -m
 1
 
 

 

위의 코드는 테이블 import가 아닌 DB import 코드이다.

1. sqoop과 mysql이 제대로 연결되어 있는지 확인하기 위헤 다음과 같은 명령어로 mysql 데이터베이스 리스트를 보도록한다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14

[hadoop@master ~]$ sqoop list-databases --connect jdbc:mysql://xxx.xxx.xxx.35:3309 --username hadoop --password '비밀번호'
Warning: /usr/local/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /usr/local/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/05/09 17:33:26 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
19/05/09 17:33:26 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/05/09 17:33:26 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
information_schema
member
mysql
performance_schema

 

HBase, HCatalog, Accumulo는 필요없으니 무시....

아까 만들었던 student_import.sh로 member라는 데이터베이스에서 student 테이블 리스트를 가져오도록 하겠습니다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74

[hadoop@master sqoop]$ sqoop --options-file student_import.sh
Warning: /usr/local/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /usr/local/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/05/09 17:44:16 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
19/05/09 17:44:16 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/05/09 17:44:16 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
19/05/09 17:44:16 INFO tool.CodeGenTool: Beginning code generation
19/05/09 17:44:16 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1
19/05/09 17:44:16 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1
19/05/09 17:44:16 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop
Note: /tmp/sqoop-hadoop/compile/3ca02518569f69f0effb4492d2b8319c/student.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
19/05/09 17:44:17 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/3ca02518569f69f0effb4492d2b8319c/student.jar
19/05/09 17:44:17 WARN manager.MySQLManager: It looks like you are importing from mysql.
19/05/09 17:44:17 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
19/05/09 17:44:17 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
19/05/09 17:44:17 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
19/05/09 17:44:17 INFO mapreduce.ImportJobBase: Beginning import of student
19/05/09 17:44:17 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
19/05/09 17:44:17 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
19/05/09 17:44:17 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
19/05/09 17:44:18 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.0.35:8032
19/05/09 17:44:20 INFO db.DBInputFormat: Using read commited transaction isolation
19/05/09 17:44:20 INFO mapreduce.JobSubmitter: number of splits:1
19/05/09 17:44:20 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1557362273260_0003
19/05/09 17:44:20 INFO impl.YarnClientImpl: Submitted application application_1557362273260_0003
19/05/09 17:44:20 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1557362273260_0003/
19/05/09 17:44:20 INFO mapreduce.Job: Running job: job_1557362273260_0003
19/05/09 17:44:27 INFO mapreduce.Job: Job job_1557362273260_0003 running in uber mode : false
19/05/09 17:44:27 INFO mapreduce.Job:  map 0% reduce 0%
19/05/09 17:44:33 INFO mapreduce.Job:  map 100% reduce 0%
19/05/09 17:44:34 INFO mapreduce.Job: Job job_1557362273260_0003 completed successfully
19/05/09 17:44:34 INFO mapreduce.Job: Counters: 30
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=142093
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=87
                HDFS: Number of bytes written=44
                HDFS: Number of read operations=4
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters
                Launched map tasks=1
                Other local map tasks=1
                Total time spent by all maps in occupied slots (ms)=5970
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=2985
                Total vcore-milliseconds taken by all map tasks=2985
                Total megabyte-milliseconds taken by all map tasks=4584960
        Map-Reduce Framework
                Map input records=2
                Map output records=2
                Input split bytes=87
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=75
                CPU time spent (ms)=950
                Physical memory (bytes) snapshot=248098816
                Virtual memory (bytes) snapshot=3030355968
                Total committed heap usage (bytes)=351272960
        File Input Format Counters
                Bytes Read=0
        File Output Format Counters
                Bytes Written=44
19/05/09 17:44:34 INFO mapreduce.ImportJobBase: Transferred 44 bytes in 16.2221 seconds (2.7123 bytes/sec)
19/05/09 17:44:34 INFO mapreduce.ImportJobBase: Retrieved 2 records.

 

HDFS 웹 UI 확인

http://ip주소:50070

YARN 웹 UI 확인

http://ip주소:8088

마지막 다음과 같은 명령어로 HDFS에 있는 student 파일 확인하기

1
2
3

[hadoop@master sqoop]$ hdfs dfs -cat /user/hadoop/student/*
1,smith,010-1111-1111
2,kelly,010-2222-2222

 

import 참고

http://hochul.net/blog/datacollector_apache_sqoop_from_rdbms2/

데이터 수집 #2 Apache Sqoop을 활용하여 RDBMS 데이터 수집(2)

[DBGuide.net] 연재 글 – 데이터 수집 #2 Apache Sqoop을 활용하여 RDBMS 데이터 수집(2) [DBGuide.net] 연재 글 데이터 수집 #2 Apache Sqoop을 활용하여 RDBMS 데이터 수집(2) 데이터 수집 #2 Apache Sqoop을 활용하여 RDBMS 데이터 수집(1) 데이터 수집 #1…

hochul.net

이상으로 hadoop-sqoop-mysql 연동을 마무리 하겠습니다.

'Centos7 > Sqoop' 카테고리의 다른 글

Python Script로 Sqoop import 하는 방법(MySQL --> HDFS) (0)	2019.05.16
스쿱-MySQL 연동 1. Sqoop & MySQL Connector 설치 및 설정 (0)	2019.05.09

게임프로그래밍

스쿱-MySQL 연동 2. 테스트

'Centos7 > Sqoop' 카테고리의 다른 글

+ Recent posts

티스토리툴바