What is Machine Data?
Machine data refers to data generated by machines, equipment, and devices which are either recorded on local storage media or transmitted wirelessly to remote processors. With the miniturisation of sensing devices, and the Internet of Things in a device-connected world, there are more and more machine data available for exploitation.
Machine data comes in many forms, and log files are the most common source of machine data today across all industries. A list of machine data is appended below:
1. Application Logs
Common examples include server performance logs, error logs, syslog, firewall security logs and data generated by any application, collected as either complete log files or just updates, which can be joined, transformed and aggregated as needed.
2. Application Server Logs
Many application servers generate log files using standard logging frameworks such as log4j. The data contains not only critical insights into application and application server operation and performance, but also the transaction information that offers insights into business transactions and in particular fraud and security problems.
3. Call Detail Records for Telecom Services
Call Detail Records (CDRs) and IP Data Records (IPDRs) are generated by telecoms network equipment for every call and session, and contain the information necessary to produce billing records. They also contain information that can be used to determine service quality and customer experience issues, particularly when joined in real-time with location data.
4. Clickstream Data
Clickstream data captures users’ activity on websites. It contains valuable information on visitor activity that can be used to measure customer experience, drive real-time advertisement placement and detect exit page on websites for example.
5. GPS Data
GPS data records the exact position of a device at a specific time. GPS events can be transformed easily into position and movement information. Telecommunications, transportation, logistics and telematics rely on the accurate and sophisticated processing of GPS information.
6. IP Router Syslog
All IP network equipment from the major Internet Service Providers use syslog information to capture connection status, capacity information, routing information, failure alerts, security alerts and performance data. When processed in real-time with relevant data from other sources, this data provides a unique insight into the operation of the network and offers a platform for predictive analytics and forecasting.
7. Sensor Data
The availability of low cost, intelligent sensors, and the emergence of 3G and 4G wireless technology has driven a huge increase in the volume of sensor data. This is further amplified by the need to extract operational intelligence in real-time from the data. Examples include industrial automation plants, smart metering, environmental monitoring and the oil and natural gas industry.
8. SCADA Data
Supervisory Control and Data Acquisition (SCADA) is the data management infrastructure for industrial control systems. SCADA systems produce an immense volume of measurement data, status information and failure alerts, and is widely deployed for remote equipment process monitoring across the smart grid, oil and gas, transportation and utilities sectors.
We are writing to share what we read about Big Data and related subjects with readers from around the world.