Mozilla Common Voice 项目,旨在帮助教会机器真人的说话方式。

Abduqadir Abliz 7cb2959d71 Pontoon: Update Uyghur (ug) localization of Common Voice 12 小时之前
.github 8c19ff977c Create documentation_request.md (#4445) 1 月之前
bundler a21e9736d4 Update dependency stream-transform to v3.3.1 (#4475) 5 天之前
common 120e8143e7 feat: add variant tag to sentence review (#4479) 4 天之前
docker 31c51a1f27 Oi 2796 update dataset release process (#4243) 5 月之前
docs ec750841db feat: update bulk sentence import command (#4380) 2 月之前
locales 3e3d0d96b5 Add language metadata (#3704) 2 年之前
maintenance 63b5c1f2af Update dependency replace-in-file to v6.3.5 (#3944) 1 年之前
scripts a0da2c4f15 feat: add import downloades script (#4310) 4 月之前
server f957e723fc feat: implement filter function for speak page (#4480) 3 天之前
web 7cb2959d71 Pontoon: Update Uyghur (ug) localization of Common Voice 12 小时之前
.dockerignore 31c51a1f27 Oi 2796 update dataset release process (#4243) 5 月之前
.editorconfig 52952c5d08 Add editorconfig 6 年之前
.env-local-docker.example 3e992b3621 OI-2928: Update development documentation (#4369) 3 月之前
.eslintignore 1448e30f70 feat(linting): add eslint support (#3410) 2 年之前
.eslintrc.js 6dc5ff4dc7 OI-2925: Enable text corpus release (#4370) 2 月之前
.gitattributes a0b8925a5c different approach to ignoring langauge files on github language stats 5 年之前
.gitignore a593f45030 feat: skip files that are smaller than 256B (#4296) 4 月之前
.node-version bee0b65b90 Oi 2283 upgrade to node 18 and debian bookworm (#4190) 7 月之前
.prettierignore f0b56019c9 Add prettier ignore file for sentence data (#2645) 4 年之前
.prettierrc f3866e75aa OI-2651 integrate back end changes for sc endpoints (#3963) 1 年之前
Japanese-sentence-submission.txt 79063c519c Create Japanese-sentence-submission.txt (#3988) 1 年之前
LICENSE ec4999c715 Update HTTP links to HTTPS. Issue #1027 (#1028) 6 年之前
README.md 4577ea10ba Update README.md 2 月之前
contribute.json 6d925082ec Remove references to TravisCI (#2987) 3 年之前
docker-compose.yaml a593f45030 feat: skip files that are smaller than 256B (#4296) 4 月之前
l10n.toml fbb01dae79 Add l10n.toml 6 年之前
package.json 5decb7aaed fix: remove mysql2 from dev dependencies (#4442) 1 周之前
renovate.json e0133ec0f0 Update renovate config 2 年之前
tsconfig.base.json 219f5310fe fix(lodash): importing lodash incorrectly (#3497) 2 年之前
tsconfig.eslint.json 1448e30f70 feat(linting): add eslint support (#3410) 2 年之前
yarn.lock a322c6dfde Update dependency downshift to v9.0.4 (#4474) 6 天之前

README.md

Common Voice

This is the web app for Mozilla Common Voice, a platform for collecting speech donations in order to create public domain datasets for training voice recognition-related tools.

Upcoming releases

Type Release Cadence More info
Platform code & sentences Monthly, or as needed Release notes
Dataset Quarterly Dataset metadata

Quick links

How to contribute

🎉 First off, thanks for taking the time to contribute! This project would not be possible without people like you. 🎉

There are many ways to get involved with Common Voice - you don't have to know how to code to contribute!

  • To add or correct the translation of the web interface, please use the Mozilla localization platform Pontoon. Please note, we do not accept any direct pull requests for changing localization content.
  • For information on how to add or edit sentences to Common Voice, see SENTENCES.md
  • For instructions on setting up a local development environment, see DEVELOPMENT.md
  • For information on how to add a new language to Common Voice, see LANGUAGE.md
  • For information on how to get in contact with existing language communities, see COMMUNITIES.md

For more general guidance on building your own language community using Mozilla voice tools, please refer to the Mozilla Voice Community Playbook.

Discussion

For general discussion (feedback, ideas, random musings), head to our Discourse Category.

For bug reports or specific feature, please use the GitHub issue tracker.

For live chat, join us on Matrix.

Licensing and content source

This repository is released under MPL (Mozilla Public License) 2.0.

The majority of our sentence text in /server/data comes directly from user submissions in our Sentence Collector or they are scraped from Wikipedia using our extractor tool, and are released under a CC0 public domain Creative Commons license.

Any files that follow the pattern europarl-VERSION-LANG.txt (such as europarl-v7-de.txt) were extracted with our thanks from the Europarl Corpus, which features transcripts from proceedings in the European parliament.

Citation

If you use the data in a published academic work we would appreciate if you cite the following article:

  • Ardila, R., Branson, M., Davis, K., Henretty, M., Kohler, M., Meyer, J., Morais, R., Saunders, L., Tyers, F. M. and Weber, G. (2020) "Common Voice: A Massively-Multilingual Speech Corpus". Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020). pp. 4211—4215

The BiBTex is:

@inproceedings{commonvoice:2020,
  author = {Ardila, R. and Branson, M. and Davis, K. and Henretty, M. and Kohler, M. and Meyer, J. and Morais, R. and Saunders, L. and Tyers, F. M. and Weber, G.},
  title = {Common Voice: A Massively-Multilingual Speech Corpus},
  booktitle = {Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020)},
  pages = {4211--4215},
  year = 2020
}

Cross Browser Testing

This project is tested with Browserstack