Read The Times Australia

Daily Bulletin

Size doesn’t matter in Big Data, it’s what you ask of it that counts

  • Written by: The Conversation Contributor
image

Big Data is changing the way we do science today. Traditionally, data were collected manually by scientists making measurements, using microscopes or surveys. These data could be analysed by hand or using simple statistical software on a PC.

Big Data has changed all that. These days, tremendous volumes of information are being generated and collected through new technologies, be they large telescope arrays, DNA sequencers or Facebook.

The data is vast, but the kinds of data and the formats they take are also new. Consider the hourly clicks on Facebook, or the daily searches on Google. As a result, Big Data offers scientists the ability to perform powerful analyses and make new discoveries.

The problem is that Big Data hasn’t yet changed the way many researchers ask scientific questions. In biology in particular, where tools like genome sequencing are generating tremendous amounts of data, biologists might not be asking the right kinds of questions that Big Data can answer.

Questions

Asking questions is what scientists do. Biologists ask questions about the living world, such as “how many species are there?” or “what are the evolutionary relationships between rats, bats and primates?”.

The way we ask questions says a lot about the type of information we use. For example, systematists like myself study the diversity and relationship between the many species of creatures throughout evolutionary history.

We have tended to use physical characteristics, like teeth and bones, to classify mammals into taxonomic groups. These shared characteristics allow us to recognise new species and identify existing ones.

Enter Big Data, and cheap DNA sequencing technology. Now systematists have access to new forms of information, such as whole genomes, which have drastically changed the way we do systematics. But it hasn’t changed the way many systematists frame their questions.

Biologists are expecting big things from Big Data, but they are finding out that it initially delivers only so much. Rather than find out what these limitations are and how they can shape our questions, many biologists have responded by gathering more and more data. Put simply: scientists have been lured by size.

Size matters

Quantity is often seen as a benchmark of success. The more you have, the better your study will be.

This thinking stems from the idealistic view of complete datasets with unbiased sampling. Statisticians call this “n = all”, which represents a data set that contains all the information.

If all the data was available, then scientists wouldn’t have the problem of missing or corrupted data. A real world example would be a complete genome sequence.

Having all the data would tell us everything, right? Not exactly.

From 2004 to 2006, J. Craig Venter led an expedition to sample genomes in sea water from the North Atlantic. He concluded he had found 1,800 species.

Not so fast. He did, in fact, find thousands of unique genomes, but to determine whether they are new species will require Venter and his team to compare and diagnose each organism, as well as name them.

So, in answer to the question: “how many species are there in this bucket of water?”, Big Data gave the answer of 1.045 billion base pairs. But 1.045 billion base pairs could mean any number of species.

Size doesn’t matter, it is what we ask of our data that counts.

Wrong questions

Asking impossible questions has been the bane of Big Data across many fields of research. For example, Google Flu Trends, an initiative launched by Google to predict flu epidemics weeks before the Centers for Disease Control and Prevention (CDC), made the mistake of asking a traditionally framed question: “when will the next flu epidemic hit North America?”.

The data analysed were non-traditional, namely the number and frequency of Google search terms. When compared to CDC data, it was discovered that Google Flu Trends missed the 2009 epidemic and over-predicted flu trends by more than double between 2012 and 2013.

In 2013, Google Flu Trends was abandoned as being unable to answer the questions we were asking of it. Some statisticians blamed sampling bias, others blamed the lack of transparency regarding the Google search terms. Another reason could simply be that the question asked was inappropriate given the non-traditional data collected.

Big Data is being misunderstood, and this is limiting our ability to find meaningful answers to our questions. Big Data is not a replacement for traditional methods and questions. Rather, it is a supplement.

Biologists also need to adjust the questions aimed at Big Data. Unlike traditional data, Big Data cannot give a precise answer to a traditionally framed question.

Instead Big Data sends the scientist onto a path to bigger and bigger discoveries. Big and traditional data can be used together can enable biologists to better navigate their way down the path of discovery.

If Venter actually took the next step and examined those sea creatures, we could make a historic discovery. If Google Flu Trends asked “what do the frequency and number of Google search terms tell us?”, then we may make an even bigger discovery.

As we incorporate Big Data into the existing scientific line of enquiry, we also need to accommodate appropriate questions. Until then, biologists are stuck with impossible answers to the wrong questions.

Authors: The Conversation Contributor

Read more http://theconversation.com/size-doesnt-matter-in-big-data-its-what-you-ask-of-it-that-counts-55571

Business News

Inside the Icon: The BridgeMuseum Officially Opens at the Sydney Harbour Bridge

A bold new way to experience one of Australia’s most recognisable landmarks has arrived, with BridgeClimb Sydney officially opening the all-new BridgeMuseum.  Located inside the Sydney Harbour Brid...

Daily Bulletin - avatar Daily Bulletin

Is Your Brand Showing Up in AI Search? Most Melbourne Brands Aren't.

The New Front Door Nobody Told You About Something changed. Quietly. Without a press release. The way buyers find businesses in Australia has been rewired. Not replaced, rewired. Google isn't dead...

Daily Bulletin - avatar Daily Bulletin

How Australian Businesses Can Measure SEO ROI

SEO can feel vague when you are staring at a dashboard full of numbers that do not clearly connect to revenue. The key is to measure the right signals in the right order, then tie them back to outcome...

Daily Bulletin - avatar Daily Bulletin

How Commercial Roller Shutters Improve Site Security Without Slowing Operations

Security upgrades can be frustrating when they make everyday work harder. A door that takes too long to open, creates bottlenecks at shift change, or fails at the worst time can turn “better protectio...

Daily Bulletin - avatar Daily Bulletin

Why a Document Destruction Service Still Matters for Modern Businesses

Businesses generate large volumes of information every day, from staff records and contracts to invoices, reports and customer files. While attention often focuses on how documents are stored, the way...

Daily Bulletin - avatar Daily Bulletin

Bicycle Rack Safety and Space-Smart Storage

Bike storage problems usually show up as small annoyances first: tangled handlebars, scratched frames, and bikes that topple when you pull one out. Over time, those issues become safety risks, especia...

Daily Bulletin - avatar Daily Bulletin

How to Tell if a Childcare Centre Is a Good Fit for Your Child

Choosing childcare can feel like you’re making a huge decision with limited information. Tours are short, centres are often on their best behaviour, and your child might act differently in a new space...

Daily Bulletin - avatar Daily Bulletin

Car Import Timeline: What Usually Happens at Each Stage

Importing a car into Australia can feel confusing because multiple agencies and checkpoints are involved, and the timeline is shaped as much by paperwork quality as it is by shipping speed. The most u...

Daily Bulletin - avatar Daily Bulletin

Portable Toilet Hygiene Standards Explained: Clean vs Sanitised vs Disinfected

In portable toilet servicing, the words clean, sanitised, and disinfected often get used as if they mean the same thing. They don’t. And that difference matters because a unit can look tidy and still ...

Daily Bulletin - avatar Daily Bulletin

The Daily Magazine

Gold Migration Lawyers in Liquidation: How the Closure Affects Your ART Appeal

If your appeal was with Gold Migration Lawyers, a recent change to how the Tribunal decides cases ...

The pressure cooker: life in urban Australia in 2026

Australian cities have always been demanding. Long commutes, rising housing costs, busy schedules a...

What Actually Makes a Good Criminal Lawyer in Melbourne

Most people only think about this question once. That is usually too late. Most people charged wi...

Why Working With A Chatswood Tutor Can Improve Academic Performance

Academic expectations continue increasing for students across primary school, high school, and senio...

Is It Worth Getting Solar Panels in Melbourne?

The real question is not whether solar works in Melbourne. It works. The question is what it is co...

How A Diploma Of Project Management Builds Practical Skills For Modern Work Environments

Developing the ability to plan, execute, and deliver outcomes efficiently is a key requirement in to...

How to Choose the Right Football for Every Level

Choosing a football may seem straightforward, but the right option depends on who will be using it a...

What to Ask a Wedding Photographer Before You Book

Booking a wedding photographer can feel deceptively simple: you like the photos, you like the vibe...

Why Stress Relief For Dogs Is Essential For Emotional Balance And Long-Term Wellbeing

Managing emotional health is just as important as physical care when it comes to pets, which is why ...