Mr. Seeker
12 supporters
Finetuning Picard v2 (Fairseq)

Finetuning Picard v2 (Fairseq)

Feb 15, 2022

Hi all,

I am not the guy that usually writes blogs and stuff, so bear with me. I am also not writing this using an AI but all by myself. I finally overcame the big training hurdle and found an excellent way to get things going. I am currently working on the Fairseq 6.7B model, which is quite a significant model for its size compared to the 2.7B model. Before I start talking about it, I want to give a general notice first.

General notice:
When I made Picard and Shinen using GPT-Neo 2.7B, they only took around 30h of training per model. Each model costs approximately 45 USD to create, which I could afford without too much pain in my wallet. When I released it to the public, it was a pretty okay model compared to the more "private" models that NovelAI and others are using.

Now that I am building Picard version 2, I realise that I might have overestimated the actual cost of building big models. Compared to 2.7B, 6.7B models are 2.5 times more extensive and require almost 2.5 times more training time. I could get away with doubling up on the hardware, but the fact is that just doubling the hardware alone would not be enough.

I am not the guy that usually asks for money, but in this case, I do have to say that I need all the donations I can to cover spending on building new models. In the case of GPT-J models, if someone is willing to help me out by providing some one-on-one time setting things up and getting it running using TRC, that would be great. For the fairseq models, however, running it would be a different story:

I can only build 6.7B fairseq models if enough people cover costs to get them running on TPU or if they get converted to train on TPU. I do not like to beg for money, but renting servers with GPUs do cost quite a bit. I am already happy that I could get a cheap rental, but during the writing of this post, I realised that 20% more training data also results in 20% more grinding. I will, at some point, release 6.7B models to the public, but it won't be this month. If someone is willing to pay me more than 20 EUR or like to send money by wire transfer or have a server with A6000's or better gathering dust somewhere, please get in touch with me directly to discuss further details. Please note that a "gaming rig" does not qualify. I will be trying to stress your server to the max.

Some technical details about Picard:

Total ebooks used: 2210
Total Gb of books: 1,02GB
Num examples = 128740
Num Epochs to train = 1
Preferred Author's Note: "[Genre: <|>]"

Some background information about Picard:
The original model was a GPT-Neo 2.7B model, created using hand-curated novels. The authors range from A. American to Zac Harrison, and is primarily focussed around sci-fi and fantasy. Each story was converted from epub to txt format and then scanned and cleaned by hand(!) to ensure that the AI would deliver that perfect novel each time. The way the dataset works would also guarantee that if you wanted to write a book, it would provide a good novel and not some random "I hope you like this story, please buy ..." which would kill the immersion and the fun.

When I saw how much fun people had using Picard, I also made Shinen: An alternative to the Horni and Horni-Ln models made by finetuneanon, which felt a bit tame for my taste. I checked various websites, and I scraped all the good stuff that I could find, which resulted in Shinen, which currently has around 37.5k short stories, and the newer version also adds all kinds of furry lore to the mix. This one is only "automatically" cleaned and is not that good for writing stories (Picard does a much better job at that).

Picard v2:

Picard v2 is a bit different. Instead of going for the GPT-Neo 2.7B model again, I am now going for the fairseq version first. The model is similar to the model that NovelAI is using but with fewer nodes. The reason is that the 13B model is huge (20Gb), and to train those, I need more considerable servers with NVLink enabled. I also decided to go for the fairseq model first because, in tests, I found out that the GPT-Neo model did not do that well with storytelling. Since the model will focus on writing novels, I decided against using GPT-J to build the model first. It does not mean that I won't release a GPT-J version at all. It just takes time.

The main difference between Picard V1 and V2 will be the dataset. Whereas V1 focuses on fantasy and science fiction, V2 adds a couple of new things to the mix: Horror, Romance, Young Adult, Thriller, Detective, Action, to name a few. And it also introduces a lot more authors. I reused the original V1 dataset, cleaned it a bit more, and added many new books. These are manually checked but cleaned better than the original V1 model. I hope it does make a difference.

ModronAI:

To recover costs and keep enough funding for future projects, I will be starting a new project quite soon, called "ModronAI". The project will be Patreon-funded because I want to keep buymeacoffee to be a donation page and a way to say "thank you" for the models I am creating. Different from Picard (which is more of a "free beer for everyone" project), Patreon supporters will get full access to the ModronAI library (and its models). In contrast, everyone else will only have access to the community-made ones.

Unlike how I built Picard, I plan to turn this one into a "living" dataset. This means that your input, stories, and world information will eventually become part of the dataset used to train the model. This also means that I need your input, I need the stories you make with it, and I hope to create a constantly evolving world. It would become a fun experience for explorers, dungeon masters, and story writers since they will help make this world a reality. More information will be released later, but do get in touch with me if you want to help out in the initial stages (mostly related to writing the "genesis").

Final note:

I will still keep making models, don't worry about that. The cost of making a big model is currently holding me back from making big models. I might do a GPT-J run in the near future, but I need someone with the expertise and time who is willing to guide me through all the steps required.

Enjoy this post?

Buy Mr. Seeker a prompt

More from Mr. Seeker