1. 采集数据_项目一
说明
- 用户点击页面后数据存储到a.log文件中。(本项目省去了这一步,数据已经在a.log中了)
- 使用java代码将a.log文件中的数据,写入project.log中。
- 使用flume采集日志,监控project.log文件内容的变化,将新增的用户的数据写出到hdfs上。
a.log中的现成数据
点击查看代码
120.191.181.178 - - 2018-02-18 20:24:39 "POST https://www.taobao.com/item/b HTTP/1.1" 203 69172 https:149.74.183.133 - - 2018-09-24 19:38:17 "GET https://www.taobao.com/register HTTP/1.0" 300 72815 https:58.9.92.122 - - 2018-08-30 11:28:15 "GET https://www.taobao.com/list/ HTTP/1.0" 203 17119 https:77.56.72.210 - - 2018-05-13 18:11:22 "POST https://www.taobao.com/category/b HTTP/1.0" 201 17843 https:217.147.196.74 - - 2018-08-22 18:06:01 "GET https://www.taobao.com/category/c HTTP/1.0" 501 95033 https:37.146.124.65 - - 2018-05-08 02:12:24 "POST https://www.taobao.com/category/d HTTP/1.1" 203 47329 https:167.108.24.171 - - 2018-08-20 12:19:01 "POST https://www.taobao.com/recommand HTTP/1.1" 302 80056 https:94.69.229.202 - - 2018-07-27 13:46:37 "GET https://www.taobao.com/recommand HTTP/1.1" 501 63116 https:
日志采集脚本的编写
[root@node1 dataCollect]dataCollect.sources = s1dataCollect.channels = c1dataCollect.sinks = k1 dataCollect.sources.s1.type = execdataCollect.sources.s1.command = tail -F /opt/project/dataCollect/project.log dataCollect.channels.c1.type = memorydataCollect.channels.c1.capacity = 20000dataCollect.channels.c1.transactionCapacity = 10000dataCollect.channels.c1.byteCapacity = 1048576000 dataCollect.sinks.k1.type = hdfsdataCollect.sinks.k1.hdfs.path = hdfs://node1:9000/project/%Y%m%d/dataCollect.sinks.k1.hdfs.filePrefix = project-dataCollect.sinks.k1.hdfs.fileSuffix = .logdataCollect.sinks.k1.hdfs.round = truedataCollect.sinks.k1.hdfs.roundValue = 24dataCollect.sinks.k1.hdfs.roundUnit = hourdataCollect.sinks.k1.hdfs.useLocalTimeStamp = truedataCollect.sinks.k1.hdfs.batchSize = 5000dataCollect.sinks.k1.hdfs.fileType = DataStreamdataCollect.sinks.k1.hdfs.rollInterval = 21600dataCollect.sinks.k1.hdfs.rollSize = 134217700dataCollect.sinks.k1.hdfs.rollCount = 0dataCollect.sinks.k1.hdfs.minBlockReplicas = 1 dataCollect.sources.s1.channels = c1dataCollect.sinks.k1.channel = c1
使用java代码将用户行为数据写入project.log中
package com.sxuek; import java.io.*;import java.util.Random; public class SimPro { public static void main(String[] args) throws IOException, InterruptedException { BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream("/root/project/dataCollect/a.log"), "UTF-8")); BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream("/root/project/dataCollect/project.log", true), "utf-8")); String line = null; Random random = new Random(); while ((line = br.readLine()) != null) { int time = random.nextInt(5000); int count = 1 + random.nextInt(10000); Thread.sleep(time); System.out.println("间隔了"+time+"时间,有"+count+"个用户点击了网站,产生了用户行为日志数据"); for (int i = 0; i < count; i++) { bw.write(line); bw.newLine(); bw.flush(); line = br.readLine(); } } }}
打jar包上传,运行
开始采集
flume-ng agent -n dataCollect -f dataCollect.conf -Dflume.root.logger=INFO,console java -jar untitled.jar SimPro.java