Methodology

Community Reporting & Processing Model

How reports are submitted, normalised, de-duplicated, and stored within the Reverseau dataset.

← Methodology Overview

Processing Overview

  1. Anonymous community submission
  2. Format standardisation
  3. De-duplication and abuse filtering
  4. Structured storage and aggregation
  5. System-wide update propagation

Submission Pipeline

All data in the Reverseau dataset originates from community submissions. When an individual receives a phone call or SMS, they may submit a report describing their experience with that number. Reports are submitted anonymously — no registration, name, or email is required. This anonymous submission model has been in place since the platform launched in 2014.

Reverseau does not independently verify the factual accuracy of individual submissions; instead, the dataset reflects aggregated community-reported experiences.

Each submission contains the following structured fields:

Format Standardisation

Submitted phone numbers undergo format standardisation upon intake. Numbers are normalised to a consistent 10-digit Australian format, removing spaces, dashes, country codes (+61), and parentheses. This ensures that all reports for the same number are correctly aggregated regardless of how the number was entered.

De-Duplication

The system applies de-duplication logic to prevent the same reporter from submitting multiple identical reports for the same number within a defined time window. De-duplication operates on a combination of submission metadata (technical submission metadata and session-based signals) and content similarity. De-duplication logic is designed to reduce repetitive submissions from the same source, while preserving independent reports from unrelated individuals.

Abuse Filtering

All submissions pass through an automated pre-screening layer that checks for:

Reports that pass automated screening are published promptly. Reports flagged as potentially problematic are held for human review. Automated screening assists moderation but does not independently determine final classification outcomes. Human moderator decisions take precedence over automated assessments. Moderation actions are applied in accordance with published platform guidelines.

For details on how automated systems are used in this pipeline, see the AI Transparency Statement.

Storage & Aggregation

Accepted reports are stored as individual records linked to the reported phone number. The aggregate profile for each number — including displayed classification, safety rating, and report count — is recalculated as new reports are received. Aggregation calculations consider reporting volume, recency, and category distribution within defined evaluation thresholds. These thresholds are defined within the Reporting Signal Evaluation Framework. This means a number's displayed classification may change over time as more community data accumulates.

Telecommunications allocation metadata is maintained separately from community reporting data and does not indicate confirmed caller ownership.

AI-Assisted Summaries

Phone detail pages may display a summary paragraph describing reported activity. These summaries are generated from aggregated community report data and public telecommunications records using AI-assisted tools. AI-assisted summaries do not introduce new factual claims beyond the underlying community submissions and public allocation data. Summaries are clearly labelled with the disclosure: "AI-assisted analysis based on community reports and public data." They are regenerated periodically and subject to human review.

Full details are documented in the AI Transparency Statement.

Update Propagation

When a new report is accepted, the following updates propagate through the system:

  1. The report is stored as an individual record
  2. The number's aggregate classification and rating are recalculated
  3. The number appears on the recently updated feed
  4. Related aggregation pages (state, service type, prefix) reflect the updated data

Historical records remain accessible as part of the permanent dataset archive.

Related Documentation