mysql->从员工中选择*
empno | empname | salary
======================================================
| 101 | Ram | 5000 |
| 102 | Hari | 7000 |
| 104 | Vamshi | 7000 |
| 103 | Revathy | 7000 |
| 105 | Jaya | 9000 |
| 106 | Suresh | 8000 |
| 107 | Ramesh | 9000 |
| 108 | Prasana | 10000 |
| 109 | Ramsamy | 20000 |
| 110 | Singaram | 30000 |
| 200 | ramanathan | 30000 |
| 201 | Victor | 33000 |
| 202 | Naveen | 33000 |
| 203 | Karthik | 33000 |
| 204 | Karthikeyan | 33000 |
| 205 | Somasundaram | 43000 |
| 301 | Test1 | 50000 |
| 302 | Test2 | 60000 |
| 303 | Test3 | 70000
Command in Sqoop
sqoop import --connect jdbc:mysql://<hostname>/test --username <username> --password <password> --table employee
--direct --verbose
--split-by salary
By giving above command, it takes min(salary), max(salary) and moves to HDFS table by 10 records in first file,
3 records in second file,
3 records in third file & 3 records in last file
5/07/03 17:32:37 INFO db.DataDrivenDBInputFormat:
BoundingValsQuery: SELECT MIN(`salary`), MAX(`salary`) FROM employee
15/07/03 17:32:37 DEBUG db.IntegerSplitter: Splits: [
5,000 to 70,000] into 4 parts
15/07/03 17:32:37 DEBUG db.IntegerSplitter: 5,000
15/07/03 17:32:37 DEBUG db.IntegerSplitter: 21,250
15/07/03 17:32:37 DEBUG db.IntegerSplitter: 37,500
15/07/03 17:32:37 DEBUG db.IntegerSplitter: 53,750
15/07/03 17:32:37 DEBUG db.IntegerSplitter: 70,000
15/07/03 17:32:37 DEBUG db.DataDrivenDBInputFormat: Creating input split with lower bound '`salary` >= 5000' and upper bound '`salary` < 21250'
15/07/03 17:32:37 DEBUG db.DataDrivenDBInputFormat: Creating input split with lower bound '`salary` >= 21250' and upper bound '`salary` < 37500'
15/07/03 17:32:37 DEBUG db.DataDrivenDBInputFormat: Creating input split with lower bound '`salary` >= 37500' and upper bound '`salary` < 53750'
15/07/03 17:32:37 DEBUG db.DataDrivenDBInputFormat: Creating input split with lower bound '`salary` >= 53750' and upper bound '`salary` <= 70000'
15/07/03 17:32:37 INFO mapreduce.JobSubmitter: number of splits:4
我想知道它是如何对每个文件中的记录数进行分类的。可定制吗。
1条答案
按热度按时间ubby3x7f1#
工资范围为
5000 - 70000
(i.e. min 5000, max 70000)
. 工资分为四类。因此,