Processing Overview

Anonymous community submission
Format standardisation
De-duplication and abuse filtering
Structured storage and aggregation
System-wide update propagation

Submission Pipeline

All data in the Reverseau dataset originates from community submissions. When an individual receives a phone call or SMS, they may submit a report describing their experience with that number. Reports are submitted anonymously — no registration, name, or email is required. This anonymous submission model has been in place since the platform launched in 2014.

Reverseau does not independently verify the factual accuracy of individual submissions; instead, the dataset reflects aggregated community-reported experiences.

Each submission contains the following structured fields:

Phone number — the number being reported, normalised to the Australian 10-digit format
Caller type classification — selected from a fixed set of categories (see Reporting Signal Evaluation)
Safety rating — a reporter-submitted numerical indicator reflecting the individual's perception of the interaction. The displayed aggregate rating on a number page represents a calculated summary of multiple community submissions and may incorporate weighting factors such as recency and reporting volume.
Written description — a free-text account of the call experience

Format Standardisation

Submitted phone numbers undergo format standardisation upon intake. Numbers are normalised to a consistent 10-digit Australian format, removing spaces, dashes, country codes (+61), and parentheses. This ensures that all reports for the same number are correctly aggregated regardless of how the number was entered.

De-Duplication

The system applies de-duplication logic to prevent the same reporter from submitting multiple identical reports for the same number within a defined time window. De-duplication operates on a combination of submission metadata (technical submission metadata and session-based signals) and content similarity. De-duplication logic is designed to reduce repetitive submissions from the same source, while preserving independent reports from unrelated individuals.

Abuse Filtering

All submissions pass through an automated pre-screening layer that checks for:

Spam content and repetitive submissions
Personal identifying information (names, addresses, government ID numbers)
Profanity, hate speech, and guideline violations
Duplicate or near-duplicate content

Reports that pass automated screening are published promptly. Reports flagged as potentially problematic are held for human review. Automated screening assists moderation but does not independently determine final classification outcomes. Human moderator decisions take precedence over automated assessments. Moderation actions are applied in accordance with published platform guidelines.

For details on how automated systems are used in this pipeline, see the AI Transparency Statement.

Storage & Aggregation

Accepted reports are stored as individual records linked to the reported phone number. The aggregate profile for each number — including displayed classification, safety rating, and report count — is recalculated as new reports are received. Aggregation calculations consider reporting volume, recency, and category distribution within defined evaluation thresholds. These thresholds are defined within the Reporting Signal Evaluation Framework. This means a number's displayed classification may change over time as more community data accumulates.

Telecommunications allocation metadata is maintained separately from community reporting data and does not indicate confirmed caller ownership.

AI-Assisted Summaries

Phone detail pages may display a summary paragraph describing reported activity. These summaries are generated from aggregated community report data and public telecommunications records using AI-assisted tools. AI-assisted summaries do not introduce new factual claims beyond the underlying community submissions and public allocation data. Summaries are clearly labelled with the disclosure: "AI-assisted analysis based on community reports and public data." They are regenerated periodically and subject to human review.

Full details are documented in the AI Transparency Statement.

Update Propagation

When a new report is accepted, the following updates propagate through the system:

The report is stored as an individual record
The number's aggregate classification and rating are recalculated
The number appears on the recently updated feed
Related aggregation pages (state, service type, prefix) reflect the updated data

Historical records remain accessible as part of the permanent dataset archive.

Community Reporting & Processing Model