Hi everyone 👋!

This is Husein Zolkepli, and I like to spend my time developing Malaya library, a Natural-Language-Toolkit library for Bahasa Malaysia, powered by Deep Learning Tensorflow and publish more Bahasa Malaysia dataset and corpus, Malay-Dataset.

Full documentation,

Malaya can do a lot of things with just less than 5 lines of code,

  1. Text Augmentation
  2. Dependency Parsing
  3. Emotion Analysis
  4. Entities Recognition
  5. Text Generator
  6. Hyperlocal Language Detection
  7. Text Normalizer
  8. Num2Word
  9. Part-of-Speech Recognition
  10. Relevancy Analysis
  11. Sentiment Analysis
  12. Text similarity
  13. Spelling Correction
  14. Stemming
  15. Subjectivity Analysis
  16. Abstractive and Extractive Summarization
  17. Topic Modelling
  18. Toxicity Analysis
  19. Word2Num
  20. WordVector

Malaya also released Bahasa pretrained models, simply check at Malaya/pretrained-model

Or can try to use huggingface 🤗 Transformers library,

  1. ALbert
  3. BERT
  5. GPT2
  6. T5
  7. Tiny-BERT
  8. XLNET

Malaya development already been recognized by MDEC and MIGHT, and stated by them, 'to prepare Malaysia for Industry 4.0'. Modern NLP is all about smart interfacing humans with machines.

We spent more than RM100k to released pre-trained and fine-tuned models, so if you run a business and using Malaya library or Malay-Dataset in a revenue-generating product, it would make business sense to sponsor Malaya development, or do some researches and found Malaya library or Malay-Dataset are really helpful, feel free to donate.

These what I am going to do if get more initiatives,

  1. Pay AWS s3 traffic! I store checkpoints and big dataset inside s3.
  2. Pay linguists to validate my dataset and corpus improving active learning.
  3. Maintaining and developing new features to all these projects takes a considerable amount of time and `universal exchange`, and I am currently exploring the possibility of working on Malaya full time.