Skip to content

Instantly share code, notes, and snippets.

Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.

Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

@MihailCosmin
MihailCosmin / cuda_11.8_installation_on_Ubuntu_22.04
Last active May 7, 2024 00:38 — forked from primus852/cuda_11.7_installation_on_Ubuntu_22.04
Instructions for CUDA v11.8 and cuDNN 8.7 installation on Ubuntu 22.04 for PyTorch 2.0.0
#!/bin/bash
### steps ####
# verify the system has a cuda-capable gpu
# download and install the nvidia cuda toolkit and cudnn
# setup environmental variables
# verify the installation
###
### to verify your gpu is cuda enable check
@superseb
superseb / rke2-commands.md
Last active May 7, 2024 00:38
RKE2 commands

RKE2 commands

Install

curl -sL https://get.rke2.io | sh
systemctl daemon-reload
systemctl start rke2-server
@leocomelli
leocomelli / git.md
Last active May 7, 2024 00:38
Lista de comandos úteis do GIT

GIT

Estados

  • Modificado (modified);
  • Preparado (staged/index)
  • Consolidado (comitted);

Ajuda

I had a large dataset in postgis and wanted to avoid the hassle of first exporting it to a several GB geojson file before tiling it with Tippecanoe.

ogr2ogr -f GeoJSON /dev/stdout \                                                                            
PG:"host=localhost dbname=postgres user=postgres password=thepassword" \
-sql "select * from a, roi where a.geom && roi.geom" \
| docker run -i -v ${PWD}:/data tippecanoe:latest tippecanoe \
--output=/data/yourtiles.mbtiles
@dibmartins
dibmartins / install-chrome-ubuntu
Created February 8, 2017 16:12
install-chrome-ubuntu
wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | sudo apt-key add -
sudo sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list'
sudo apt-get update
sudo apt-get install google-chrome-stable
@OrionReed
OrionReed / dom3d.js
Last active May 7, 2024 00:34
3D DOM viewer, copy-paste this into your console to visualise the DOM topographically.
// 3D Dom viewer, copy-paste this into your console to visualise the DOM as a stack of solid blocks.
// You can also minify and save it as a bookmarklet (https://www.freecodecamp.org/news/what-are-bookmarklets/)
(() => {
const SHOW_SIDES = false; // color sides of DOM nodes?
const COLOR_SURFACE = true; // color tops of DOM nodes?
const COLOR_RANDOM = false; // randomise color?
const COLOR_HUE = 190; // hue in HSL (https://hslpicker.com)
const MAX_ROTATION = 180; // set to 360 to rotate all the way round
const THICKNESS = 20; // thickness of layers
const DISTANCE = 10000; // ¯\\_(ツ)_/¯
@mlocati
mlocati / exceptions-tree.php
Created March 9, 2017 10:58
Throwable and Exceptions tree
<?php
if (!function_exists('interface_exists')) {
die('PHP version too old');
}
$throwables = listThrowableClasses();
$throwablesPerParent = splitInParents($throwables);
printTree($throwablesPerParent);
if (count($throwablesPerParent) !== 0) {
die('ERROR!!!');
@wojteklu
wojteklu / clean_code.md
Last active May 7, 2024 00:25
Summary of 'Clean code' by Robert C. Martin

Code is clean if it can be understood easily – by everyone on the team. Clean code can be read and enhanced by a developer other than its original author. With understandability comes readability, changeability, extensibility and maintainability.


General rules

  1. Follow standard conventions.
  2. Keep it simple stupid. Simpler is always better. Reduce complexity as much as possible.
  3. Boy scout rule. Leave the campground cleaner than you found it.
  4. Always find root cause. Always look for the root cause of a problem.

Design rules