@notabot

notabot@lemm.ee · 2 months ago

Could you let me know what sort of models you’re using? Everything I’ve tried has basically been so bad it was quicker and more reliable to to the job myself. Most of the models can barely write boilerplate code accurately and securely, let alone anything even moderately complex.

I’ve tried to get them to analyse code too, and that’s hit and miss at best, even with small programs. I’d have no faith at all that they could handle anything larger; the answers they give would be confident and wrong, which is easy to spot with something small, but much harder to catch with a large, multi process system spread over a network. It’s hard enough for humans, who have actual context, understanding and domain knowledge, to do it well, and I’ve, personally, not seen any evidence that an LLM (which is what I’m assuming you’re referring to) could do anywhere near as well. I don’t doubt that they flag some issues, but without a comprehensive, human, review of the system architecture, implementation and code, you can’t be sure what they’ve missed, and if you’re going to do that anyway, you’ve done the job yourself!

Having said that, I’ve no doubt that things will improve, programming languages have well defined syntaxes and so they should be some of the easiest types of text for an LLM to parse and build a context from. If that can be combined with enough domain knowledge, a description of the deployment environment and a model that’s actually trained for and tuned for code analysis and security auditing, it might be possible to get similar results to humans.

notabot@lemm.ee · 2 months ago

I’m unlikely to do a full code audit, unless something about it doesn’t pass the ‘sniff test’. I will often go over the main code flows, the issue tracker, mailing lists and comments, positive or negative, from users on other forums.

I mean, if you’re not doing that, what are you doing, just installing it and using it??!? Where’s the fun in that? (I mean this at least semi seriously, you learn a lot about the software you’re running if you put in some effort to learn about it)

notabot@lemm.ee · 2 months ago

‘AI’ as we currently know it, is terrible at this sort of task. It’s not capable of understanding the flow of the code in any meaningful way, and tends to raise entirely spurious issues (see the problems the curl author has with being overwhealmed for example). It also wont spot actually malicious code that’s been included with any sort of care, nor would it find intentional behaviour that would be harmful or counterproductive in the particular scenario you want to use the program.

notabot@lemm.ee · 3 months ago

Before you can decide on how to do this, you’re going to have to make a few choices:

Authentication and Access

Theres two main ways to expose a git repo, HTTPS or SSH, and they both have pros and cons here:

HTTPS A standard sort of protocol to proxy, but you’ll need to make sure you set up authentication on the proxy properly so that only only thise who should have access can get it. The git client will need to store a username and password to talk to the server or you’ll have to enter them on every request. gitweb is a CGI that provides a basic, but useful, web interface.
SSH Simpler to set up, and authentication is a solved problem. Proxying it isn’t hard, just forward the port to any of the backend servers, which avoids decrypting on the proxy. You will want to use the same hostkey on all the servers though, or SSH will refuse to connect. Doesn’t require any special setup.

Replication

Git is a distributed version control system, so you could replicate it at that level, alternatively you could use a replicated file system, or a simple file based replication. Each has it’s own trade-offs.

Git replication Using git pull to replicate between repositories is probably going to be your most reliable option, as it’s the job git was built for, and doesn’t rely on messing with it’s underlying files directly. The one caveat is that, if you push to different servers in quick suscession you may cause a merge confict, which would break your replication. The cleanest way to deal with that is to have the load balancer send all requests to server1 if it’s up, and only switch to the next server if all the prior ones are down. That way writes will alk be going to the same place. Then set up replication in loop, with server2 pulling from server1, server3 pulling from server2, and so on up to server1 pulling from server5. With frequent pulls changes that are commited to server1 will quickly replicate to all the other servers. This would effectively be a shared nothing solution as none of the servers are sharing resources, which would make it easier to geigraphically separate them. The load balancer could be replaced by a CNAME record in DNS, with a daemon that updates it to point to the correct server.
Replicated filesystem Git stores its data in a fairly simple file structure, so placing that on a replicated filesystem such as GlusterFS or Ceph would mean multiple servers could use the same data. From experience, this sort of thing is great when it’s working, but can be fragile and break in unexpected ways. You don’t want to be up at 2am trying to fix a file replication issue if you can avoid it.
File replication. This is similar to the git replication option, in that you have to be very aware of the risk of conflicts. A similar strategy would probably work, but I’m not sure it brings you any advantages.

I think my prefered solution would be to have SSH access to the git servers and to set up pull based replication on a fairly fast schedule (where fast is relative to how frequently you push changes). You mention having a VPS as obe of the servers, so you might want to push changes to that rather than have be able to connect to your internal network.

A useful property of git is that, if the server is missing changesets you can just push them again. So if a server goes down before your last push gets replicated, you can just push again once the system has switched to the new server. Once the first server comes back online it’ll naturally get any changesets it’s missing and effectively ‘heal’.

notabot@lemm.ee · 3 months ago

I manage all my homelab infra stuff via ansible and run services via kubenetes. All the ansible playbooks are in git, so I can roll back if I screw something up, and I test it on a sacrificial VM first when I can. Running services in kubenetes means I can spin up new instances and test them before putting them live.

Working like that makes it all a lot more relaxing as I can be confident in my changes, and back them out if I still get it wrong.

notabot@lemm.ee · 1 year ago

It’s a non-starter for me because I sync my notes, and sometimes a subset of my notes, to multiple devices and multiple programs. For instance, I might use Obsidian, Vim and tasks.md to access the same repository, with all the documents synced between my desktop and server, and a subset synced to my phone. I also have various scripts to capture data from other sources and write it out as markdown files. Trying to sync all of this to a database that is then further synced around seems overly complicated to say the least, and would basically just be using Trillium as a file store, which I’ve already got.

I’ve also be burnt by various export/import systems either losing information or storing it in a incompatible way.

notabot@lemm.ee · 1 year ago

Thank you!

notabot@lemm.ee · 1 year ago

If you could, I’d appreciate HackADay. I’ve found a community for it on lemmy.ml, but it only seems to have one post from a year ago.

notabot@lemm.ee · 1 year ago

Putting a simple preseed file on a debian install image is probably going to be your best bet. Assuming you can run a VM on your current machine it shouldn’t be too difficult to test it until you’re happy with it.

notabot@lemm.ee · 1 year ago

It’s going to be a balance between your time getting an automated approach to work and the cost/effort of getting a monitor. Getting preseed working can be a bit fiddly, but it does mean you’ve learnt a new skill, getting a monitor sounds like it’ll be a pain, and you might only need it once.

notabot@lemm.ee · 1 year ago

While I agree with most people here that finding a keyboard and screen would be the easiest option, you do have a couple of other options:

Use a preseed file A preseed lets the installer run completely automatically, without user intervention. Get it to install a basic system with SSH and take it from there. You’ll want to test the install in a VM, where you can see what’s going on before letting it run on the real server. More information here: https://wiki.debian.org/DebianInstaller/Preseed
Boot from a live image with SSH Take a look at https://wiki.debian.org/LiveCD in particular ‘Debian Live’. It looks like ssh is included, but you’d want to check the service comes up on boot. You can then SSH to the machine and install to the harddrive that way. Again, test on a VM until you know you have the image working, and know how to run the install, then write it to a USB key and boot the tsrget server from that.

This all assumes the target server has USB or CD at the top of its boot order. If it doesn’t you’ll have to change that first, either with a keyboard and screen, or via a remote management interface sych as IPMI.

notabot@lemm.ee · 1 year ago

If you don’t need external calling you don’t need a trunk, it’s just for connecting to the outside world. I found [[https://www.asterisk.org/|Asterisk]] was a good place to start. The config is rather involved though, so there are various front ends for it.

notabot@lemm.ee · 1 year ago

You can, but I found it a bit laggy. It basically wraps your tcp stream over https, so I think the extra overhead was what was slowing it down.

notabot@lemm.ee · 1 year ago

Ah, ok. You’ll want to specify two allowedip ranges on the clients, 192.168.178.0/24 for your network, and 10.0.0.0/24 for the other clients. Then your going to need to add a couple of routes:

On the phone, a route to 192.168.178.0/24 via the wireguard address of your home server
On your home network router, a route to 10.0.0.0/24 via the local address of the machine that is connected to the wireguard vpn. (Unless it’s your router/gateway that is connected)

You’ll also need to ensure IP forwarding is enabled on both the VPS and your home machine.

notabot@lemm.ee · 1 year ago

Sort of. If you’re using wg-quick then it serves two purposes, one, as you say, is to indicate what is routed over the link, and the second (and only if you’re setting up the connection directly) is to limit what incoming packets are accepted.

It definitely can be a bit confusing as most people are using the wg-quick script to manage their connections and so the terminology isn’t obvious, but it makes more sense if you’re configuring the connection directly with wg.

notabot@lemm.ee · 1 year ago

The allowed IP ranges on the server indicate what private addresses the clients can use, so you should have a separate one for each client. They can be /32 addresses as each client only needs one address and, I’m assuming, doesn’t route traffic for anything else.

The allowed IP range on each client indicates what private address the server can use, but as the server is also routing traffic for other machines (the other client for example) it should cover those too.

Apologies that this isn’t better formatted, but I’m away from my machine. For example, on your setup you might use:

On home server: AllowedIPs 192.168.178.0/24 Address 192.168.178.2

On phone: AllowedIPs 192.168.178.0/24 Address 192.168.178.3

On VPS: Address 192.168.178.1 Home server peer: AllowedIPs 192.168.178.2/32

Phone peer: AllowedIPs 192.168.178.3/32

notabot@lemm.ee · 2 years ago

Good grief, it’s far too early in the morning for this sort of thing. My brain hurts now.

notabot@lemm.ee · 2 years ago

That’s what comes of late night posting, I’d meant to link you to PHPLDAPAdmin, not LDAPAdmin! It’s written in PHP, which isn’t lovely, but it does it’s job.

notabot@lemm.ee · 2 years ago

I confess I normally work from the command line, but I have set up LDAPAdmim for projects where others needed to manage the directory, and it worked pretty well.

notabot@lemm.ee · 2 years ago

I use an LDAP server, as it’s pretty much designed for exactly this task. You can tell PAM to authenticate and authorise from it to manage logins to the physical machines, and web apps typically either have a straightforward way to use LDAP, or support ‘external’ auth, with your web server handling the authentication and authorisation for it.

OpenLDAP is a solid, easily self hosted server. If you like working from the shell it has everything you need. If you prefer a GUI there are a variety of desktop and web based management frontends available.