Since the year 2000, 50% of Fortune 500 companies have disappeared as they failed to stay relevant in the digital economy. The time for excuses is over. Connected Insurance USA (http://bit.ly/35CWXoA) is bringing together over 700+ senior insurance leaders to redefine the future of insurance. Join the revolution in Chicago, November 20-21, and use our discount code 5054BDS100 to save $100 on your registration >>http://bit.ly/35CWXoA. #CIUSA #InsuranceNexus #connectedinsurance
As insurance joins the digital arms race, cutting edge technologies are proving the difference between success and failure. Connected Insurance USA (Chicago, Nov 20-21, http://bit.ly/35CWXoA) is where the world’s leading insurance carriers are shunning inaction and innovating their way to success. For this week only, register using our discount code 5054BDS100 and save $100 on your booking >> http://bit.ly/35CWXoA. #CIUSA #InsuranceNexus #connectedinsurance Connected Insurance USA (Chicago, Nov 20-21, http://bit.ly/35CWXoA) is bringing together over 700 senior insurance leaders to define the future of insurance. Over 500 CEOs, COOs, SVPs and VPs from across from Claims, Product, Customer, Technology and Innovation have already confirmed their places – have you? Save $100 on your booking using our discount code 5054BDS100 and join the insurance revolution >> http://bit.ly/35CWXoA. #CIUSA #InsuranceNexus #connectedinsurance Just 5 weeks to go until Connected Insurance USA (Chicago, Nov 20-21, http://bit.ly/35CWXoA) and with over 500 confirmed, places are running low. Join @USAA, @Liberty Mutual Insurance and @American Family Insurance at the only community that’s dedicated to truly transforming insurance organizations for the future. Use our discount code 5054BDS100 to save $100 on your booking and secure your place at the heart of the #connectedinsurance revolution >> http://bit.ly/35CWXoA. #CIUSA #InsuranceNexus
2 Comments
Connected Insurance Report: Industry Weighs in on Future of Technology in Insurance To some, it is magic. To insurance, it is reality. The ability to accurately discern the past and predict the future based on nothing but data points and the long-lived experience of actuaries and adjusters has served the industry well up to now, allowing insurance to become the multi-billion-dollar industry it is today. The past few years, however, have witnessed in a dramatic shift to this picture, prompted by the advent of the Internet of Things: technologies that collect, record and transmit live, granular data about their environment. This increase in the quality and quantity of available data is already having some profound effects; the process of writing policies can now be far better informed by what is known about the risk level of an individual or entity, as opposed to simply what is known about the claims generated by an entire class of risk. Consequently, it is now possible to assess claims more accurately and efficiently, and even prevent them arising, based on high-quality, objective data. This, in turn, has created the need for changes in how insurers and customers interact both before and after a claim, as well as the internal structure, operations and hiring processes of the carriers themselves. Already operating in an environment of squeezed profits, high regulation and low consumer trust, the industry is witnessing something of a perfect storm at present. The tools for insurance carriers to stay relevant and appeal to today’s consumer do now exist, but uncertainty over how best to implement such profound strategic transformation is holding many back. To provide a comprehensive overview of the progress and prospects of Connected Insurance, Insurance Nexus have produced the Connected Insurance Report, an in-depth study of the progress of insurance technology globally, today, and in the future. The Connected Insurance Report is based partly on a survey of over 500 people working in insurance and related industries, as well as the exclusive insights of 20 renowned thought-leaders, including Matteo Carbone (Founder and Director of the IoT Insurance Observatory), Cecilia Sevillano (Head Smart Homes Solutions, Swiss Re), Boris Collignon (Vice President Strategy, Innovation and Strategic Partnerships, Desjardins General Insurance Group) and more. Access the Connected Insurance Report today for in-depth insights, analyses and case-studies on the technology-led transformation of insurance, including:
The Connected Insurance Report was researched and produced by Insurance Nexus and is collaboration with the IoT Insurance Observatory. It is the first of its kind to conceive of insurance IoT holistically, as a paradigm shift necessitating changes in insurer business models, organisational structures and technology stacks. Insurance Nexus surveyed the experiences of more than 500 insurers and reinsurers to assess where they sit in the connected insurance market and to extract the challenges they face and their stories of success. Along with a panel of 20 industry leaders who have been operating at the sharp end of the IoT revolution, Insurance Nexus looked at these hurdles and opportunities and pulled them apart to provide readers with the case studies with actionable insights to help guide decision-making as the industry tackles its own strategic milestones. Contact Mariana Dumont Head of USA Operations Insurance Nexus Phone: +44 (0) 207 422 4369 Toll Free: 1 800 814 3459 Ext: 4369 Email: [email protected] Insurance Nexus is part of FC Business Intelligence Ltd. FC Business Intelligence Ltd is a registered company in England and Wales. Registered number 04388971, 7-9 Fashion Street, London, E1 6PX, UK Insurance Nexus is the central hub for insurance executives. Through in-depth industry analysis, targeted research, niche events and quality content, we provide the industry with a platform to network, discuss, learn and shape the future of the insurance industry. ### Meet Dark Data, The Ignored Prodigal Elder Brother Of Big Data AMBIKA CHOUDHURY APR 13, 2019 Dark Data is any data which is basically ignored and remains stored without any indexing. It eventually becomes invisible to the researchers which finally results in it being lost. This data is generally unstructured because it has been collected by organisations unknowingly and has never been used for any decision-making or made available to the public. Bob Picciano, Senior VP of Analytics at IBM told a news portal, “Data that is difficult to work with creates a high barrier to entry. People typically forego trying to get any information out of it. About 90% of data generated by most sensors and other sources on the market never get utilised, and 60% of that data loses its true value within milliseconds.” How Is It Generated? The main reason behind the dark data generation is the collection of a large amount of data and not enough analysis. Data is generating every moment, the moment a user clicks on some link or site, data is generated which helps the organisations to analyse in order to improve their business. But they utilise only a little amount of data which is structured and stored in databases and the rest remains as unstructured and lost between the other unindexed data. According to reports, 7.5 sextillion gigabytes of data is generated worldwide every single day where 6.75 Septillion megabytes of data goes as dark data. The dark data remain stored in the files of data repositories without being analysed or processed. One more reason for the generating of dark data is the lack of proper analytical tools which support some other formats of data in order to analyse for the process of decision making. Importance Of Dark Data In Big data Dark data is a part of Big data. The data which are considered as dark can be from various logs, emails, old documents, ex-employee information, statements, ID numbers, etc. With the advent of Big data, the framework like Hadoop came into the picture and has been growing exponentially. This framework has been used by the organisations for the processing of large volumes of data including the dark data. According to this report, in the year 2020, the digital universe is expected to reach 44 zettabytes where IoT will see an explosive growth of 20.8 billion connected devices which will be 269 times greater than the amount of data being transmitted to data centres from end-user devices and 49 times higher than the total data-centre traffic. Since dark data can be said as the subset of Big data, it can be used to analyse and discover valuable insights in an organisation which will eventually present a much greater valuable insight than the organisations are currently gaining. The dark data can be used for various purposes, for instance, a large amount of data is generated from servers, networking, firewalls, etc. which can be used to analyse the network security in the environment. Organisations can use dark data to analyse and develop patterns and other relationships for the process of decision making, etc. Bottom Line One of the primary output of any organisation is data. While a little portion of data has been taken great care of by the researchers for decision-making purposes, it is also crucial to lessen the dark data as much as possible by pruning, auditing, etc. Increase in dark data will only conclude an increase in storage cost as well as security risks. Source: https://www.analyticsindiamag.com/meet-dark-data-the-ignored-prodigal-elder-brother-of-big-data/ Disclaimer: The above article is published here in addition to providing a link in other pages of Big Data Space website so that visitors can still read the article in the event of having a broken link to the original article. As a media partner to the upcoming event - Connected Claims USA Summit to be held on June 5 - 6, 2019 in Chicago, USA, Big Data Space (BDS) is privileged to have access to an insightful article with great quotes from insurance executives, created in conjunction with Insurance AI and Analytics USA, where BDS has been given permission to share with its visitors.
A Roadmap to the Future of Insurance: Real-World Case Studies and Expert Analysis from QBE, USAA, Hippo and more at Insurance AI & Analytics USA 2018 saw unprecedented advances in the investment and deployment of artificial intelligence capabilities within the insurance industry, and 2019 promises to be no different. With the Insurance AI and Analytics USA Summit kicking off in Chicago, May 2-3, we spoke to some of our speakers to gauge their thoughts on the challenges insurance carriers face in implementing AI, and how the summit will help insurance carriers address these issues through holistic, innovative and proven strategies. “AI is impacting insurance at an unprecedented pace”, claims Bilal Parviz, Vice President Product Development at Arch MI, and “with innovative new players entering the industry and the traditional players trying to catch up,” says Manulife VP of Group Advanced Analytics, Eugene Wen, “the industry is going to experience further significant change.” “It’s impossible to open a magazine without seeing hype about analytics changing every aspect of your life,” William Dubyak, VP Analytics for Product Development and Innovation at USAA agrees. “The Insurance AI and Analytica USA Summit is the optimal place to cut through the noise, hear the latest thinking from industry leaders in analytics, and compare the best practices with colleagues.” Although good progress has been made to date, there is a definite sense that we are only at the tip of the AI-iceberg. In the eyes of Novarica VP of Research Consulting, Chuck Gomez, “each year, the topic of AI gets more interesting as emerging technology evolves and adoption rates go up, indicating that more can be accomplished with progress. While the subject of analytics has been around insurers for a while…there’s still a lot to learn about analytics centered around underwriting, claims and customer service. It is conferences like Insurance AI and Analytics that helps to further progress every year by bringing great thinkers and enquiring minds together.” On a strategic level, Insurance AI and Analytics USA will address distinct applications of AI, with separate tracks focusing on Transforming Underwriting, delivering Customer-Centric Claims and Creating Outstanding Customer Experiences. Thomas Sheffield, SVP, Specialty Claims at QBE, who will speak on the Claims track, is particularly excited to explore the future of AI-powered claims; “From a claims perspective,” he says, “our next ten or twenty years will be defined by how well we embrace technology, artificial intelligence and the nearly boundless opportunities that arise from those advancements. The Insurance AI and Analytics USA Summit brings critical insights and guidance to professionals who are improving their own skills as well as those responsible for designing the future architecture for claims.” Conducting our research the Insurance AI and Analytics USA agenda, it was striking how many carriers cited difficulties in actually embedding the technology they recognize as so integral to their future success. To provide a clear, strategic implementation framework for carriers, Insurance AI and Analytics USA will feature real-world case studies from Hippo Insurance, American Family Insurance and QBE, amongst others. “Having the opportunity to hear real business cases and then asking questions that will provide meaningful impact to your organization is a valuable asset” says Chuck Gomez, and with over 25 hours of case studies, discussions and keynotes from over 40 expert speakers, attendees will leave equipped with the skills and insights to flourish in 2019 and beyond. With over 200 attendees confirmed and another 250 expected, there has never been a better time to register you and your team at Insurance AI & Analytics USA. Group discounts are still available, and more information can be found on the event website: https://events.insurancenexus.com/analyticsusa/ or by getting in touch: [email protected] Data Lakes vs Data Warehouses Data Warehouses, traditionally popular for business intelligence tasks, are being replaced by less-structured Data Lakes which allow more flexibility. By Sundeep Sanghavi (Co-founder & CEO, DataRPM), June 2014. There is a phenomenal shift that is happening now in the enterprise data world with data warehouses, which have so far been the foundation for business intelligence and data discovery for several decades, getting obsoleted by the emergence of data lakes. The limitation of data warehouses is that they store data from various sources in some specific static structures and categories that dictate the kind of analysis that is possible on that data, at the very point of entry. While this was sufficient during the early stages of evolution of business intelligence where analysis was primarily done on proprietary databases and the scope was restricted to the canned reports, dashboards with limited and pre-defined interaction paths. This approach has started to fall apart in the world of big data discovery where it is very difficult to ascertain upfront all the intelligence and insights one would be able to derive from the variety of different sources, including proprietary databases, files, 3rd party tools to social media and web, that keep cropping up on a regular basis. While one may have some initial list of questions that they want answers to during the setup phase but the real questions only emerge when one starts analyzing the data. Ability to navigate from a starting question or data point to different directions, slicing and dicing the data in any ad-hoc way that the train-of-thought of analysis demands is essential for real data discovery. Example someone may start with a question like “What was the total revenue last year from North America” on a Transaction Database and then may want to slice the result by “North American states” and further by the “Demographics of the buyer” from the CRM Database and then proceed to correlate with the “Ads Campaigns”, from the Ad Platforms, to analyze the effectiveness of marketing spends and then navigate from there to evaluate the impact of efficiency and timelines of their “Delivery Logistics” on repeat sales by using the GPS data of vehicles. All of these analyses are happening on ad-hoc basis with new data sources added on the fly as per what the user thought process requires for decision-making. Therefore the traditional approach of manually curated data warehouses, which provide limited window view of data and are designed to answer only specific questions identified at the design time, doesn’t make sense any more for data discovery in today’s big data world.
This is where data lakes excel and why the world is now shifting away from data warehouses to data lakes. A data lake is a hub or repository of all data that any organization has access to, where the data is ingested and stored in as close to the raw form as possible without enforcing any restrictive schema. This provides an unlimited window of view of data for anyone to run ad-hoc queries and perform cross-source navigation and analysis on the fly. Successful data lake implementations respond to queries in real-time and provide users an easy and uniform access interface to the disparate sources of data. DataRPM offers an integrated data lake and data discovery platform for the modern business. We are to the world of big data discovery for enterprises what Google is to the world of information discovery for the web. We have pioneered a Smart Machine that delivers the following:
With DataRPM the machines do all the heavy lifting that the implementation of data lakes and big data discovery platform entail. All that a business needs to is to specify the connection parameters to the different data sources and then start asking questions in natural language to get answers that they can interact with and collaborate in-place with stake holders, anytime and anywhere. DataRPM empowers any user to become a data scientist and leverage the true power of data intelligence in the fastest, easiest, affordable, scalable and most natural way, thereby delivering data democracy in any organization. Source: https://www.kdnuggets.com/2014/06/data-lakes-vs-data-warehouses.html Disclaimer: The above article is published here in addition to providing a link in other pages of Big Data Space website so that visitors can still read the article in the event of having a broken link to the original article. Database vs Data Warehouse
You've got data. Your competitors have data. We all have data, and lots of it. It's a data-driven world and that leaves us with a question: what do we do with it all? Database overview The short answer to our question of what to do with all that data is to put it in a database. A database is the basic building block of your data solution. Data has to live somewhere, and for most applications, that's a database. It's basically an organized collection of data. Typically, the type of database used for this is an OLTP (online transaction processing) database. But there's more to the picture than storing information from one source or application. Today's business is built on data and OLTP databases aren't typically designed to excel at running analysis across very large data sets consisting of multiple data sources. As you begin to accumulate more and more data from multiple sources, and need to do things like transform and perform analysis on it, having the data from your multiple, disparate sources stored in and across multiple OLTP databases can become a liability. Performing separate analysis on each data source is inefficient and costly at best. You'll need a better place to keep data from all of those data sources — a place that allows you to maintain a single repository of, and run analytics on, all your data sources and streams simultaneously. Data warehouse overview A better answer to our question is to centralize the data in a data warehouse. A data warehouse is basically a database (or group of databases) specially designed to store, filter, retrieve, and analyze very large collections of data. Data warehouses are OLAP (Online Analytical Processing) based and designed for analysis. The modern approach is to put data from all of your databases (and data streams) into a monolithic data warehouse. This allows you to perform visualization and analysis one time — on the bulk of your data simultaneously rather than multiple times on smaller chunks — without having to merge or reconcile the results. For data warehouses, the choice is between on-premise and cloud-based solutions. On-premise data warehouses (think Oracle, IBM, Teradata, etc.) typically excel at flexibility and security. You have more control over management and configuration when you host the servers or have direct access to them. Cloud-based data warehouses (such as Amazon Redshift, Google BigQuery, Snowflake, etc.) provide more scalability and lower entry and maintenance costs. You can spin up (and pay for) additional computing power and storage only when you need it, for example. Further, the resources are always available so you can get up and running quickly, without having to wait for new hardware or capacity to be purchased, installed, and brought online. We talk about how to pick a data warehouse in A Guide to Selecting the Right Cloud Data Warehouse. How they stack up Database Used for storing data from one or a limited number of applications or sources. Pros: Processing digital transactions, established technology Cons: Reporting, visualization, and analysis cannot be performed across a very large integrated set of data sources and streams Data warehouse Used for aggregating data from many different data sources, and make that data available for visualization, reporting, and analysis. Purpose-built for analysis. Pros: Better support for reporting, analysis, big data, data retrieval, and visualization, designed to store data from any number of data sources Cons: Costly compared to a single database, preparation/configuration of data prior to ingestion, (for cloud data warehouses) less control over access and security configuration What works best for you? If you're dealing with more than one (or just a few) applications and data sources, you'll likely find that OLTP databases and RDBMSs are not a good solution. Here's the thing: the number of data sources and data streams is growing every day. The proliferation of new cloud and SaaS offerings is resulting in a flood of data crucial to your business. Keeping all of that data in their siloed sources causes problems with analysis. How can you know what you have? How can you find what you need? How can you analyze it all? Once you start having to sync data from multiple databases, you've reached the point where you should consider implementing some kind of extract, transform, load (ETL) process to move your data from your databases and data sources/streams to a single data warehouse. Conclusion Ultimately, today's data-driven business environment relies on speedy, thorough analysis. For many companies, that means getting your data quickly and accurately from potentially many different databases (and other data sources/streams) into a powerful, cloud-based, data warehouse — possibly with some transformation along the way. If that's your situation, Alooma has you covered. Alooma is an enterprise ETL platform built to enable real-time data migration from all your data sources. We can help you collect, extract, transform, validate, and load your data into your data warehouse, for insights never before possible. Find out more about how we can be the answer to your data questions. Let us help you get your data into a data warehouse and working for you today. Source: https://www.alooma.com/blog/database-vs-data-warehouse Disclaimer: The above article is published here in addition to providing a link in other pages of Big Data Space website so that visitors can still read the article in the event of having a broken link to the original article. What Types of Questions Can Data Science Answer Data science has enabled us to solve complex and diverse problems by using machine learning and statistic algorithms. Here we have enumerated the common applications of supervised, unsupervised and reinforcement learning techniques Machine learning (ML) is the motor that drives data science. Each ML method (also called an algorithm) takes in data, turns it over, and spits out an answer. ML algorithms do the part of data science that is the trickiest to explain and the most fun to work with. That’s where the mathematical magic happens. ML algorithms can be grouped into families based on the type of question they answer. These can help guide your thinking as you are formulating your razor sharp question. Is this A or B?
This family is formally known as two-class classification. It’s useful for any question that has just two possible answers: yes or no, on or off, smoking or non-smoking, purchased or not. Lots of data science questions sound like this or can be re-phrased to fit this form. It’s the simplest and most commonly asked data science question. Here are few typical examples.
Is this A or B or C or D? This algorithm family is called multi-class classification. Like its name implies, it answers a question that has several (or even many) possible answers: which flavor, which person, which part, which company, which candidate. Most multi-class classification algorithms are just extensions of two-class classification algorithms. Here are a few typical examples.
Is this Weird? This family of algorithms performs anomaly detection. They identify data points that are not normal. If you are paying close attention, you noticed that this looks like a binary classification question. It can be answered yes or no. The difference is that binary classification assumes you have a collection of examples of both yes and no cases. Anomaly detection doesn’t. This is particularly useful when what you are looking for occurs so rarely that you haven’t had a chance to collect many examples of it, like equipment failures. It’s also very helpful when there is a lot of variety in what constitutes “not normal,” as there is in credit card fraud detection. Here are some typical anomaly detection questions.
How Much / How Many? When you are looking for a number instead of a class or category, the algorithm family to use is regression.
Usually, regression algorithms give a real-valued answer; the answers can have lots of decimal places or even be negative. For some questions, especially questions beginning “How many…”, negative answers may have to be re-interpreted as zero and fractional values re-interpreted as the nearest whole number. Multi-Class Classification as Regression Sometimes questions that look like multi-value classification questions are actually better suited to regression. For instance, “Which news story is the most interesting to this reader?” appears to ask for a category—a single item from the list of news stories. However, you can reformulate it to “How interesting is each story on this list to this reader?” and give each article a numerical score. Then it is a simple thing to identify the highest-scoring article. Questions of this type often occur as rankings or comparisons.
Two-Class Classification as Regression It may not come as a surprise that binary classification problems can also be reformulated as regression. (In fact, under the hood some algorithms reformulate every binary classification as regression.) This is especially helpful when an example can belong part A and part B, or have a chance of going either way. When an answer can be partly yes and no, probably on but possibly off, then regression can reflect that. Questions of this type often begin “How likely…” or “What fraction…”
As you may have gathered, the families of two-class classification, multi-class classification, anomaly detection, and regression are all closely related. They all belong to the same extended family, supervised learning. They have a lot in common, and often questions can be modified and posed in more than one of them. What they all share is that they are built using a set labeled examples (a process called training), after which they can assign a value or category to unlabeled examples (a process called scoring). Entirely different sets of data science questions belong in the extended algorithm families of unsupervised and reinforcement learning. How is this Data Organized? Questions about how data is organized belong to unsupervised learning. There are a wide variety of techniques that try to tease out the structure of data. One family of these perform clustering, a.k.a. chunking, grouping, bunching, or segmentation. They seek to separate out a data set into intuitive chunks. What makes clustering different from supervised learning is that there is no number or name that tells you what group each point belongs to, what the groups represent, or even how many groups there should be. If supervised learning is picking out planets from among the stars in the night sky, then clustering is inventing constellations. Clustering tries to separate out data into natural “clumps,” so that a human analyst can more easily interpret it and explain it to others. Clustering always relies on a definition of closeness or similarity, called a distance metric. The distance metric can be any measurable quantity, such as difference in IQ, number of shared genetic base pairs, or miles-as-the-crow-flies. Clustering questions all try to break data into more nearly uniform groups.
Another family of unsupervised learning algorithms are called dimensionality reduction techniques. Dimensionality reduction is another way to simplify the data, to make it both easier to communicate, faster to compute with, and easier to store. At its core, dimensionality reduction is all about creating a shorthand for describing data points. A simple example is GPA. A college student’s academic strength is measured in dozens of classes by hundreds of exams and thousands of assignments. Each assignment says something about how well that student understands the course material, but a full listing of them would be way too much for any recruiter to digest. Luckily, you can create a shorthand just by averaging all the scores together. You can get away with this massive simplification because students who do very well on one assignment or in one class typically do well in others. By using GPA rather than the full portfolio, you do lose richness. For instance, you wouldn’t know it if the student is stronger in math than English, or if she scored better on take-home programming assignments than on in-class quizzes. But what you gain is simplicity, which makes it a lot easier to talk about and compare students’ strength. Dimensionality reduction-related questions are usually about factors that tend to vary together.
What Should I Do Now? A third extended family of ML algorithms focuses on taking actions. These are called reinforcement learning (RL) algorithms. They are little different than the supervised and unsupervised learning algorithms. A regression algorithm might predict that the high temperature will be 98 degrees tomorrow, but it doesn’t decide what to do about it. A RL algorithm goes the next step and chooses an action, such as pre-refrigerating the upper floors of the office building while the day is still cool. RL algorithms were originally inspired by how the brains of rats and humans respond to punishment and rewards. They choose actions, trying very hard to choose the action that will earn the greatest reward. You have to provide them with a set of possible actions, and they need to get feedback after each action on whether it was good, neutral, or a huge mistake. Typically RL algorithms are a good fit for automated systems that have to make a lot of small decisions without a human’s guidance. Elevators, heating, cooling, and lighting systems are excellent candidates. RL was originally developed to control robots, so anything that moves on its own, from inspection drones to vacuum cleaners, is fair game. Questions that RL answers are always about what action should be taken, although the action is usually taken by machine.
RL usually requires more effort to get working than other algorithm types because it’s so tightly integrated with the rest of the system. The upside is that most RL algorithms can start working without any data. They gather data as they go, learning from trial and error. The next and final post in this series will give lots of specific examples of sharp data science questions and the algorithm family best suited to each. Stay tuned. Source: https://www.kdnuggets.com/2015/09/questions-data-science-can-answer.html Disclaimer: The above article is published here in addition to providing a link in other pages of Big Data Space website so that visitors can still read the article in the event of having a broken link to the original article. The world’s most valuable resource is no longer oil, but data The data economy demands a new approach to antitrust rules May 6th 2017 A NEW commodity spawns a lucrative, fast-growing industry, prompting antitrust regulators to step in to restrain those who control its flow. A century ago, the resource in question was oil. Now similar concerns are being raised by the giants that deal in data, the oil of the digital era. These titans—Alphabet (Google’s parent company), Amazon, Apple, Facebook and Microsoft—look unstoppable. They are the five most valuable listed firms in the world. Their profits are surging: they collectively racked up over $25bn in net profit in the first quarter of 2017. Amazon captures half of all dollars spent online in America. Google and Facebook accounted for almost all the revenue growth in digital advertising in America last year. Such dominance has prompted calls for the tech giants to be broken up, as Standard Oil was in the early 20th century. This newspaper has argued against such drastic action in the past. Size alone is not a crime. The giants’ success has benefited consumers. Few want to live without Google’s search engine, Amazon’s one-day delivery or Facebook’s newsfeed. Nor do these firms raise the alarm when standard antitrust tests are applied. Far from gouging consumers, many of their services are free (users pay, in effect, by handing over yet more data). Take account of offline rivals, and their market shares look less worrying. And the emergence of upstarts like Snapchat suggests that new entrants can still make waves. But there is cause for concern. Internet companies’ control of data gives them enormous power. Old ways of thinking about competition, devised in the era of oil, look outdated in what has come to be called the “data economy” (see Briefing). A new approach is needed. Quantity has a quality all its own What has changed? Smartphones and the internet have made data abundant, ubiquitous and far more valuable. Whether you are going for a run, watching TV or even just sitting in traffic, virtually every activity creates a digital trace—more raw material for the data distilleries. As devices from watches to cars connect to the internet, the volume is increasing: some estimate that a self-driving car will generate 100 gigabytes per second. Meanwhile, artificial-intelligence (AI) techniques such as machine learning extract more value from data. Algorithms can predict when a customer is ready to buy, a jet-engine needs servicing or a person is at risk of a disease. Industrial giants such as GE and Siemens now sell themselves as data firms. This abundance of data changes the nature of competition. Technology giants have always benefited from network effects: the more users Facebook signs up, the more attractive signing up becomes for others. With data there are extra network effects. By collecting more data, a firm has more scope to improve its products, which attracts more users, generating even more data, and so on. The more data Tesla gathers from its self-driving cars, the better it can make them at driving themselves—part of the reason the firm, which sold only 25,000 cars in the first quarter, is now worth more than GM, which sold 2.3m. Vast pools of data can thus act as protective moats. Access to data also protects companies from rivals in another way. The case for being sanguine about competition in the tech industry rests on the potential for incumbents to be blindsided by a startup in a garage or an unexpected technological shift. But both are less likely in the data age. The giants’ surveillance systems span the entire economy: Google can see what people search for, Facebook what they share, Amazon what they buy. They own app stores and operating systems, and rent out computing power to startups. They have a “God’s eye view” of activities in their own markets and beyond. They can see when a new product or service gains traction, allowing them to copy it or simply buy the upstart before it becomes too great a threat. Many think Facebook’s $22bn purchase in 2014 of WhatsApp, a messaging app with fewer than 60 employees, falls into this category of “shoot-out acquisitions” that eliminate potential rivals. By providing barriers to entry and early-warning systems, data can stifle competition. Who ya gonna call, trustbusters? The nature of data makes the antitrust remedies of the past less useful. Breaking up a firm like Google into five Googlets would not stop network effects from reasserting themselves: in time, one of them would become dominant again. A radical rethink is required—and as the outlines of a new approach start to become apparent, two ideas stand out. The first is that antitrust authorities need to move from the industrial era into the 21st century. When considering a merger, for example, they have traditionally used size to determine when to intervene. They now need to take into account the extent of firms’ data assets when assessing the impact of deals. The purchase price could also be a signal that an incumbent is buying a nascent threat. On these measures, Facebook’s willingness to pay so much for WhatsApp, which had no revenue to speak of, would have raised red flags. Trustbusters must also become more data-savvy in their analysis of market dynamics, for example by using simulations to hunt for algorithms colluding over prices or to determine how best to promote competition (see Free exchange). The second principle is to loosen the grip that providers of online services have over data and give more control to those who supply them. More transparency would help: companies could be forced to reveal to consumers what information they hold and how much money they make from it. Governments could encourage the emergence of new services by opening up more of their own data vaults or managing crucial parts of the data economy as public infrastructure, as India does with its digital-identity system, Aadhaar. They could also mandate the sharing of certain kinds of data, with users’ consent—an approach Europe is taking in financial services by requiring banks to make customers’ data accessible to third parties. Rebooting antitrust for the information age will not be easy. It will entail new risks: more data sharing, for instance, could threaten privacy. But if governments don’t want a data economy dominated by a few giants, they will need to act soon. This article appeared in the Leaders section of the print edition under the headline "The world’s most valuable resource" Source: https://www.economist.com/news/leaders/21721656-data-economy-demands-new-approach-antitrust-rules-worlds-most-valuable-resource Disclaimer: The above article is published here in addition to providing a link in other pages of Big Data Space website so that visitors can still read the article in the event of having a broken link to the original article. I have the idea to start a page on Big Data related jobs for the benefit of young visitors who may be in a situation of choosing what to study in university in the near future. As Big Data and the like has taken the world by storm, and we know that the world we live in is only going to generate more data, so there will be more career opportunities around the Big Data domain. I have finally created that page. I display some salary trend of Big Data Analytics specialists for young readers to appreciate: Source of images: https://au.indeed.com
In fact, this is not just for young readers. It can be beneficial for the experienced working adults who are interested in making a career change. Is this possible? I will write more on this. Do come back for next article on making a career change to Big Data related professions. |
AuthorWe are writing to share what we read about Big Data and related subjects with readers from around the world. Archives
September 2019
Categories |