NYU

Instructor: Justin Cappos
Institute: NYU
Reference: null
Code: CS9223
Stream: Cybersecurity
Level: Graduate
Semester: 2 (Spring)
Year: 2023-2024

Lectures

1

OpenSSF

Learn about git.

Get OpenSSF gold or silver badge. Gold is tough.

OpenSSF Scorecard - run on project and gives you a score.

Code coverage for testing above 90% is good enough.

Reproducible builds project - You can work on this project. Two people if they produce a build of a project on two different systems should get the same build.

Hermetic builds - (sandbox environment) doesn't allow you to curl a script and run it directly.

CI/CD systems - integrates all the processes together.

In toto - Attestations - signed statement - I did this, here is what I put in code and this is what got built by it. SLSA - checks for in-toto attestations and gives you levels like Openssf badge.

SBOMs - Just says what's in here.

US government mandated SBOM instead of in-toto attestations as a compromise. To make it easier. SBOMs are not being used properly, there's a project going on to make SBOMs work better. Sub project of in-toto - SLSA is being used now by government.

Packaging - tarballs, rpm, debian packages. You can append files to tar and zip files. There was a vulnerability. The signatures were checked for the first version in the tarball but when tarball is uncompressed the latest version gets used.

Dependency management

3 different ways: 1. vendoring - every person who distributes has their own copy. Like npm. A copy for every project. 2. package manager (pinning) - using versions for every project. Depenabot can use this to send you emails regarding updated versions which you should incorporate. 3. use foo (could be latest or whatever exists already) - You can use latest but that's a problem since there can be sudden changes which can break your code.

2

CTO of Kusari (Startup - supply chain)

https://github.com/mlieberman85/talks

SBOMs and in-toto layout Open data formats

SBOMs to put dependencies SLSA - build logs in open format VEX

Scorecard - my idea matches?

All the above give open format data which can be analysed and verified.

GUAC OSV Deps.dev - database for the secruity data that was generated

Turtles all the way down - every piece has to be secure.

There's a lot of stuff (many data files) which have to be generated stored and verified. How to build SLSA, how SBOM is generated - so many tools.

Skootrs helps - There are many tools to use and make the docs and security files. Something new keeps coming up. Skootrs makes it all easy by doing it for the developers.

The more you allow something to be configurable, the more it breaks when integrating with other tools.

Skootrs

Making the security specifications general: s2C2F OSCAL

Dennis

SPDX, CycloneDX, SWID (Not that popular) - Formats for SBOM

OSV - Open Source Vulnerabilities (database of vulnerabilities)

SBOM needs to be mapped to list of known vulnerabilities.

spdx to osv - tool to map SBOM (SPDX) to vulnerabilities.

in-toto - attests to integrity and verifiablity of all actions performed.

in-toto layout defines how the actions and their functionaries and clients are laid out (the flow diagram) Tag, build and package - actions assigns functionaries (people or machine) to particular actions Clients can verify the signed chain.

SBOMit - An SBOM document that is verifiable with in-toto attestations and in-toto layout.

Protobomb - adds in-toto attestations to SBOMs.

3

TUF, Uptane and adaptive keys

TUF

Software updates have to be secured.

Use TUF

Container Image Content Integrity One way - put signature (signed with private key) inside package How to get the public key? targets.json - All foo packages have to signed by key A. root.json - sign targets.json with another key (chain of trust) only the root key has to be trusted. snapshot.json - for the updates in foo, this file contains the versions which are available and latest

important keys should be offline

TUF requires physical keys etc.

Sigstore fulcio - creates tree of hashes (Merkle tree) Sigstore fulcuio conplements TUF, helping with the signing process as TUF signg is physical

Uptane

Software updates in cars

Updates have to be over-the-air

Electronic Computer Units (Canvas) - Don't have internet.

Adaptive keys

Organisations share their certificates with CDNs

Authenticated Datastructures: RSA accumulator Merkle trees

4

Atoms of Confusion

Currently uses a tool implemented using Clojure to find the atoms of confusion.\ Only finds doesn't automatically fix those. This is like a linter which uses Abstract Syntax Tree (AST) to understand patterns.

Coccinelle is a tool used in Linux source code to transform bits of code. [Coccinelle][https://coccinelle.gitlabpages.inria.fr/website/] Uses pattern matching and some semantics which is more difficult to understand but it makes it easier to deal with macros which is more difficult in linter developed using Clojure.

They are looking for someone to understand Coccinelle and use it resolve common atoms of confusion in Software repositories. So basically makes the rules which are in Atoms of Confusion

Using neural networks for finding atoms of confusion? Needs to be explored.

Sigstore

Software signing

OCI registries - Indexed with Content addressable hash (not just stored for storing container images but other stuff as well)

cosign - can sign container images and even individual blobs. gitsign - signing git commits.

Oauth flows - OIDC tokens - some oauth flows have OIDC (Google) some don't (GitHub). Sigstore wants common spec so they want to put in their own server/something that can provide OIDC for Oauth providers which don't.

fulcio - prove your own identity using OIDC token and then you get code signing certificate which is ephemeral. Identities can be of different types: 1. User identity 2. CI-CD identity (some job needs permission) Different types of certificates will be provided dependent on who which identity is asking for the code signing certificate.

The certificate is ephemeral because we don't want to maintain a private key indefinitely for a long time.

Whenever the code signing certificate is used to sign code, you have to send whatever you signed along with the public certificate for the signing to be verified by Sigstore which it then logs to Rekor. You can search on Sigstore Rekor if the container image was signed properly.

Demo cosign can use the container image with the image sha tag and put a .sig file which is the certificate along with the image. You need to authenticate to cosign using the OIDC token. For user identities you need to give your email and the OIDC provider that ensures that is your email id. You need to mention OIDC provider because just the email doesn't specify who is the authoritative owner of your email domain.

Transparency logs - merkel trees.

The trust is still there but now it is shifted to people who actually care about security. One way this could get compromised if all those security people colluded to do something bad or they get hacked.

We are still trusting the OIDC providers.

Start signing your packages using Sigstore!

5

Robusto

in-toto - How the software was made, its composition (verified) TUF - the TLS for software packages (securely distributed and who should be trusted at current time) Sigstore - signed distribution.

Robusto is a combination of all three. (The tech stack of software supply chain security)

Compromise-resilience - Design a system that is designed knowing that it is not impossible to get in and its just a question of when. (It can always be hacked just make it difficult) Give attackers a hard time to get in. You know they can get in but you still make it so difficult that they don't choose you.

in-toto

Sign the things that go in and out DAG - supply chain is a DAG Policy - defines who does what .link files - attestations materials - inputs products - outputs (the files and their hashes)

Yubikey - hardware key (can be used for signing)

Trust in first use - When you use ssh then you get the public key and you are asked whether you want to trust it. That's the first time. Once trusted then that public key is trusted every time.

The update frameworks helps to verify in-toto policies with the packages that are being distributed. They match the policies with the distributed packages. TUF uses delegation, not everything is signed by humans, some signing is delegated to machines. It tells who is the right person to get the package from and what policies are applicable for this package.

We can use Yubikey for long lived keys when humans are using it to put points of trust in the humans. This is the alternate way

How to map the policy to the version number along with the package names.

6

Preston

Talk by someone who has done application and systems security.

Apps break when they are deployed and there are many edge cases that keep coming up which break it and then you have to fix your application iteratively.

You want to capture the edge cases and test them in your local environment. You also want to use these test cases for all your other applications too.

Simulating Environmental Anomalies

2 components:\ * Mutator - setting up simulated environment - monitors till it finds a system call, then it makes that call fail by returning a bad value. * Checker - did application deal with the environment correctly. The checker monitors the application to see if it failed when it checks the response from the system call.

The mutators and checkers have to be constructed by understanding what caused an anomaly in an application. These mutator and checkers should be shared as much as possible so that others can use these test cases.

Crash Simulator

made by him in 2015\
It was made on 32 bit Linux, so it is not used as it has weird dependencies.\
**If you want you can make this tools properly on 64 bit with proper
dependencies and then make that repository of mutator and checkers which people can use. Then these tests could be added to the CI/CD pipelines for testing anomalies iteratively**\
Mozilla's recording replay debugger tool modified.
Process set pools (Like apache uses) used to parallelize the process of simulation.

Many anomalies were found:

Anomaly 1 File types should be handled properly. There are files which can be infinitely long, like /dev/urandom. aspell (coreutils) is a spell checker that is part of linux and is used by many applications. Passing /dev/urandom to aspell broke it.
Anomaly 2 Cross disk file movement. For same disk movement mv just renames. For cross disk movement lot of processing is done. Many bugs are there in tools that use this code.

PORT

The order of the system calls, library calls wanted to be captured and monitored.
Define the pattern of system calls you want to see and then check for it in checker.
PORT implemented using domain specific language. Transformer is like the mutator. Which would change the pattern of system calls to break the application.
Easier way of writing checker and mutator like in CrashSimulator so that it can be easily adopted.

Master's Thesis (one year long project under Justin Cappos) - Get CrashSimulator to work again with 64 bit. These are open-source, go to SSLabs and find the research paper to reach the GitHub links. Apache 2 is the license SSLabs ships there open-source software with.

GIT TUF

Do you really want to trust Github and Gitlab to tell you that all security policies were followed.

How to enforce these policies? Gittuf

Point branches anywhere inside current git. You can point the branch to some code that was in test branch and breaks and the developer won't realise and push it to main.

Gittuf * Reference state log - Transparency log of all policies. Defines what code the branch points to. * You also want to know the names of the people who signed the commits. The keys are also managed by gittuf

7

Justin points: Nice -> If your tool catches domains only then they are whitelisted on firewall Library interposition has to be done - DAST Run on 100 repositories and see how your tool helps Gittuf - Interface has been provided Domain resolution verification part of domain-catcher can be used by adversaries to mislead domain-catcher. They will make the resolution fail in the first resolution and then make it work when domain-catcher doesn't classify it as a URL and it gets committed. Assume you are being given the information (diff, commit message) in the arguments of a standardised function. The function names are not yet standardised so create one name or ask Aditya to define one and give.

domain catcher integration with gittuf

Just checking if comment or not can be done at gittuf side. The CST part could be added but lua implementation will have to be done and it will be bulky. You can also do both: regex check on client side and CST and Taint tracking on GitHub Action.

8

It'll be tough to analyse the code to get the correct filetypes and cases are very few in which the extensions don't tell you the language.

Also, one idea to think about is on what Cappos had said: Allow only those URLs in the firewall which are found by domain-catcher. Is there a way to configure the firewall on the Gihtub runner to only allow the domains caught by domain-catcher.

9

Venu project

GPL 3.0 license for a documentation project?
It's not applicable for documentation projects. It's for software projects.
But it's legal to use it for documentation projects. Just not suitable.
Other options?
- GNU Free Documentation License (GFDL) - It's for documentation projects.
This case is called License inconsistencies.

Existing solutions

licensee/licensee (Used by GitHub)

lincc

mapping of extensions to licenses
Different sets of extensions will be mapped to the license which it should have

working

collects existing licenses
then checks if the extensions that are present match the license they should have
Gives score and statistics on the match

lincc score - number of compliant files / total number of files badge based on score

score >= 90 - green badge
score 50 - 89 - yellow badge
score < 50 - red badge

Example for testing

github.com/MicrosftDocs/azure-docs - Documentation project
- Dual licensed
- excellent score
storybookjs/storybook - software project
- MIT license
- red badge
- because MIT is for software and the same is getting used for documentation files like .md etc

licensee - very opinionated - only uses set regex to detect license files otherwise it doesn't detect the license.

Questions:

How are you analysing the license? If the LICENSE mentions this license applies to these files and this license applies to these files etc. How will your tool get configured?

It doesn't analyse the LICENSE file content. Just uses the mapping to give the score.

DC Project update

It would be better if I can somehow configure the runtime enforcement
Only allow domains that are caught by domain-catcher
Configure GitHub runner to be restricted based on URLs caught
in-toto attestations to allow only people with these privileges to configure domain-catcher rules to allow more URLs.
Example: Allowing a new regex based match for a configuration file that mentions URLs
Try to defeat obfuscation by only allowing unobfuscated and expected URLs to work

Dependency resolution project

SAT solver - Boolean satisfiability problem
Backtracking algorithm
- Try every single version from new to old until there is a conflict and then backtrack.
SAT solver
- Make the NP hard problem easier by putting some constraints on the dependency resolution
- CNF form of boolean satisfiability problem
- DPLL algorithm is old and now there are modern SAT solvers
Anaconda dependency resolution
- Boolean expression that allows only one version of a package to be installed
- Assign smaller weights to newer versions in the boolean expression to get new versions in the SAT solver by minimising the boolean expression final value.
- The results change for SAT solvers even if you add a package which has no dependencies with the current set of packages that are installed.
- Test environments have different dependencies and production has different dependencies. When fetching dependencies, the production versions returned by SAT solvers can differ from testing which could lead to bugs.
Pip dependency resolution (Cargo, Swift too)
- Uses a backtracking algorithm
- It is more predictable than SAT solvers so

SAT solvers vs Backtracking algorithm * To convert dependency resolution problem into SAT problem, there is more complexity added * SAT solvers are faster but they add complexity so it can lead to slowness too. * Backtracking algorithm is slower but it is more predictable and fast enough.

10

Whitespace is dependendent on programming language semantics.
1. Different contributors use different OS (CRLF vs LF)
2. Does not handle line wrapping
3. Is not configured for programming languages
When you try to enforce your license for the wrong files, it may not get enforced properly.
SAT solvers are faster, backtracking gives the same result every time.
While you make the package you do the detection, otherwise you'll have to download the whole package and extract and then analyse
config files can be there which can be referred to for fetching the urls There are limitations for domain-catcher as for applications in which urls come from user input should not be blocked and they cannot be defined in the domain catcher rules.
should find bugs that really matter, the false positives should be less.
-Wall could be useful
When install scripts are complicated, it is tough to review them for vulnerable code.
Fuzzing a webserver requires permission, server could crash.
You trust the enclave more and the trust is based on the enclave more.

Test would have to 6 to 8 questions. Half from your projects and half from supply chain

Github actions should have actions that are better performed in the cloud. Global checks should be done in github action. Actions that you want all your developers to do should be run as a pre commit hook.
attestations - input, output and who did what, attestations have signatures too but they are much more. signature is just verfication.
stick all data in GUAC and then use it for analysis. Go to CVE listing and see what vulnerabilities are there in the versions used (Got from SBOM). Up to you to decide what you can do with the metadata to compromise the organisation.
Delegation - from notes
fulcio and rekor and how that works. OR Just talk about what is an ephemeral key. Short lived, goes after a period of time.
Compiler errors(Syntax errors) can be caught by static analysis and dynamic analysis can be used to find runtime errors with different inputs. Collisions of names are hidden from dynamic analysis.
SLSA notes
SBOM - provides a list of dependencies and their versions, tells us what goes in the software
You can define who has rights for what. Define all the policies for managing the repo.
multiparty signing can be done with gittuf (not part of the answer). gitsign supports OIDC tokens to sign. git signing - you have no way to know which keys to use for signing. gittuf you can define that this developer can sign this commit.
open ended - will be on the exam, attacks that motivated sigstore, in-toto or SBOM signing etc. This attack occured and in response people started to use in-toto etc.

11

Just one turtle can help to prevent attacks like xz attack because it limits the system during the compilation process.

JOT (Just-one-turtle) - uses in-memory file system. Just-one-turtle uses memory instead of disk to store the files (blocks). The difference is that there will be no persistence.

Bitmask (list of 0s and 1s) are used to represent the memory addresses that are already filled.

12

Dhanraj presentation

Participate in open source programs - they work on issues in your repo. Sign up as an organisation.

Girlscript summer of code Social winter of code

Mayank presentation

Even rust depends on C/C++. So memory safety issues still exist.
Use after free is still very common
Compiler options can be used.
D_FORTIFY_SOURCE - released recently, many big projects discovered bugs by using it