Processing Overview
- Anonymous community submission
- Format standardisation
- De-duplication and abuse filtering
- Structured storage and aggregation
- System-wide update propagation
Submission Pipeline
All data in the Reverseau dataset originates from community submissions. When an individual receives a phone call or SMS, they may submit a report describing their experience with that number. Reports are submitted anonymously — no registration, name, or email is required. This anonymous submission model has been in place since the platform launched in 2014.
Reverseau does not independently verify the factual accuracy of individual submissions; instead, the dataset reflects aggregated community-reported experiences.
Each submission contains the following structured fields:
- Phone number — the number being reported, normalised to the Australian 10-digit format
- Caller type classification — selected from a fixed set of categories (see Reporting Signal Evaluation)
- Safety rating — a reporter-submitted numerical indicator reflecting the individual's perception of the interaction. The displayed aggregate rating on a number page represents a calculated summary of multiple community submissions and may incorporate weighting factors such as recency and reporting volume.
- Written description — a free-text account of the call experience
Format Standardisation
Submitted phone numbers undergo format standardisation upon intake. Numbers are normalised to a consistent 10-digit Australian format, removing spaces, dashes, country codes (+61), and parentheses. This ensures that all reports for the same number are correctly aggregated regardless of how the number was entered.
De-Duplication
The system applies de-duplication logic to prevent the same reporter from submitting multiple identical reports for the same number within a defined time window. De-duplication operates on a combination of submission metadata (technical submission metadata and session-based signals) and content similarity. De-duplication logic is designed to reduce repetitive submissions from the same source, while preserving independent reports from unrelated individuals.
Abuse Filtering
All submissions pass through an automated pre-screening layer that checks for:
- Spam content and repetitive submissions
- Personal identifying information (names, addresses, government ID numbers)
- Profanity, hate speech, and guideline violations
- Duplicate or near-duplicate content
Reports that pass automated screening are published promptly. Reports flagged as potentially problematic are held for human review. Automated screening assists moderation but does not independently determine final classification outcomes. Human moderator decisions take precedence over automated assessments. Moderation actions are applied in accordance with published platform guidelines.
For details on how automated systems are used in this pipeline, see the AI Transparency Statement.
Storage & Aggregation
Accepted reports are stored as individual records linked to the reported phone number. The aggregate profile for each number — including displayed classification, safety rating, and report count — is recalculated as new reports are received. Aggregation calculations consider reporting volume, recency, and category distribution within defined evaluation thresholds. These thresholds are defined within the Reporting Signal Evaluation Framework. This means a number's displayed classification may change over time as more community data accumulates.
Telecommunications allocation metadata is maintained separately from community reporting data and does not indicate confirmed caller ownership.
AI-Assisted Summaries
Phone detail pages may display a summary paragraph describing reported activity. These summaries are generated from aggregated community report data and public telecommunications records using AI-assisted tools. AI-assisted summaries do not introduce new factual claims beyond the underlying community submissions and public allocation data. Summaries are clearly labelled with the disclosure: "AI-assisted analysis based on community reports and public data." They are regenerated periodically and subject to human review.
Full details are documented in the AI Transparency Statement.
Update Propagation
When a new report is accepted, the following updates propagate through the system:
- The report is stored as an individual record
- The number's aggregate classification and rating are recalculated
- The number appears on the recently updated feed
- Related aggregation pages (state, service type, prefix) reflect the updated data
Historical records remain accessible as part of the permanent dataset archive.
Related Documentation
- Reporting Signal Evaluation Framework — how classifications are determined
- Transparency & Data Integrity — moderation, corrections, and dispute handling
- Data Limitations — scope and interpretation boundaries