Skip to content

Instantly share code, notes, and snippets.

Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.

Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

@mobilemind
mobilemind / git-tag-delete-local-and-remote.sh
Last active April 18, 2024 14:55
how to delete a git tag locally and remote
# delete local tag '12345'
git tag -d 12345
# delete remote tag '12345' (eg, GitHub version too)
git push origin :refs/tags/12345
# alternative approach
git push --delete origin tagName
git tag -d tagName
@ptheywood
ptheywood / gs-windows.md
Last active April 18, 2024 14:53
Instructions for PDF compression via Ghostscript on Windows

Ghostscript PDF compression on Windows

Installation

  1. Download and install Ghostscript for windows (http://downloads.ghostscript.com/public/gs916w32.exe)
  2. Optional - Add the ghostscript directory to the path environment variable
    • Control Panel > System > Advanced System Settings > Environment Variables
    • Add ;C:\Program Files (x86)\gs\gs9.16\bin to th end of the PATH variable

Usage

@piense
piense / example.py
Created October 21, 2023 17:57
Azure Devops Push
import base64
import json
import urllib
import requests
org_name = "your org"
project_name = "your project"
repo_name = "test"
PAT = "a_pat"

2004 (all are available)

Available Heading Update URL Release Date Release Version
June 18, 2020—KB4567523 (OS Build 19041.331) KB4567523 2020-06-18 OS Build 19041.331
June 9, 2020—KB4557957 (OS Build 19041.329) KB4557957 2020-06-09 OS Build 19041.329

1909

Same as 1903 below.

/** Emulate `cmd1 | cmd2 | more` pipeline using recursion.
http://stackoverflow.com/questions/20434124/recursive-piping-in-unix-environment
*/
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
<?xml version="1.0" encoding="UTF-8"?>
<scriptfile>
<settings program="actiona" version="3.10.1" scriptVersion="1.1.0" os="GNU/Linux"/>
<actions>
<action name="ActionClick" version="1.0.0"/>
<action name="ActionGoto" version="1.0.0"/>
<action name="ActionKey" version="1.0.0"/>
<action name="ActionKeyboardKeyCondition" version="1.0.0"/>
</actions>
<parameters/>
@fideloper
fideloper / certbot.sh
Last active April 18, 2024 14:43
Certbot on Ubuntu, wildcard subdomains via CloudFlare DNS challenge
# Used on Ubuntu 18.04 and 20.04
# Find instructions for other OSes here: https://certbot.eff.org/instructions
# Install Certbot via Snaps
sudo snap install core; sudo snap refresh core
sudo snap install --classic certbot
sudo ln -s /snap/bin/certbot /usr/bin/certbot
# Install DNS CloudFlare plugin
sudo snap set certbot trust-plugin-with-root=ok
@rawc0der
rawc0der / crd2jsonschema.sh
Last active April 18, 2024 14:43
Extract openapi JSON schema from Kubernetes CRD manifest
#!/bin/bash
# Small utility function based on yq to extract openAPIV3Schema from CRD
# example: crd2jsonschema.sh ./crd-alertmanager.yaml
set -e
function crd2jsonschema() {
set -e
local xkgroup="x-kubernetes-group-version-kind"
local document="$1"
local openAPIV3Schema=$(mktemp -u)
@notlin4
notlin4 / without-auth_e-book_tutorial_免登入電子書教學.md
Last active April 18, 2024 14:41 — forked from aliyaliu368/without-auth_e-book_tutorial_免登入電子書教學.md
教學用電子書與相關工具免登入教學 | 本指令碼用於繞過臺灣電子書與教學工具的前端身分驗證,達成不需要教師帳號即可使用。支援 翰林、南一、康軒、何嘉仁 四大出版社 | 請勿將本指令碼作為抄答案、侵權等惡意用途,使用本指令碼,請自行承擔所有後果與風險

教學用電子書與相關工具免登入教學

使用本指令碼即代表你同意本免責聲明

免責聲明

請勿將本指令碼作為抄答案、侵權等惡意用途,使用本指令碼,請「自行承擔」所有後果與風險。

簡介

本指令碼用於繞過臺灣電子書與教學工具的前端驗證,達成不需要教師帳號即可使用。