Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extracting virtual host from the file name #673

Closed
SeLLeRoNe opened this issue Mar 1, 2017 · 17 comments
Closed

Extracting virtual host from the file name #673

SeLLeRoNe opened this issue Mar 1, 2017 · 17 comments

Comments

@SeLLeRoNe
Copy link

SeLLeRoNe commented Mar 1, 2017

Hi there,
i'm trying to implement GoAccess for monitor my server and all the hosted domains.

The only problem is that i don't have the virtual host inside the log file but in the filename itself (DirectAdmin control panel for instance).

So i have all the logs in: /var/log/httpd/domains/
And they look like this:

home.test-server.it.error.log
home.test-server.it.log
test-domain.it.error.log
test-domain.it.log

I've managed to create scripts to look for each file and import them, overall everything is working but i don't have a per-virtual host information, so is quite huge output more or less useful as overall, but would be nice to be able to see statistics on a per-domain (virtual-host) basis...

Anyone have any hint on how to acchieve that?

Also, i've got a question, using the options to save the db and load it to have historycal data is very useful, but what if i load multiple times the same file? Will this not be merged right increasing the visit even if those were counted before, am i right? Any way to check already processed data?

Thanks everyone
Best regards
Andrea

@allinurl
Copy link
Owner

allinurl commented Mar 1, 2017

Are you looking to generate one report per domain? or one big report with a virtual host panel containing the info about all domains?

As far as loading data from the disk, #334 will address the double parsing of the log.

@SeLLeRoNe
Copy link
Author

Well both options would be nice, the first will allow me to create a report per domain to show to domain owners (but that i think can easily be done with a script and generate an outputfile on a per domain basis).
The one i'm trying to acchieve now is a full report for the whole server (so for my internal use) which show all the virtual hosts infos

Thanks for the reference to issue #334 :)

@allinurl
Copy link
Owner

allinurl commented Mar 1, 2017

You probably can do some pre-processing before data is consumed by goaccess, e.g.,

awk '{n=split(FILENAME, a, "/"); print a[n], $0}' /var/log/httpd/domains/domain.com.log

and use the %v specifier in your goaccess log-format. For instance (assuming is a combined format),

awk '{n=split(FILENAME, a, "/"); print a[n], $0}' /var/log/httpd/domains/domain.com.log | goaccess --log-format='%v %h %^[%d:%t %^] "%r" %s %b "%R" "%u"' --date-format='%d/%b/%Y' --time-format=%T

@allinurl allinurl changed the title Virtual host name in the file name Parsing virtual host from the file name Mar 1, 2017
@SeLLeRoNe
Copy link
Author

SeLLeRoNe commented Mar 2, 2017

Thanks for that, is definetely something, beside the fact that add the ".log" in the virtual host definition :) Any hint to take out the extension from the filename? It would always be a .log extension if that may help

@allinurl
Copy link
Owner

allinurl commented Mar 2, 2017

This should do it:

awk '{n=split(FILENAME, a, "/"); sub(/\.log/, "", a[n]); print a[n], $0}' /var/log/httpd/domains/domain.com.log | goaccess --log-format='%v %h %^[%d:%t %^] "%r" %s %b "%R" "%u"' --date-format='%d/%b/%Y' --time-format=%T

@allinurl allinurl changed the title Parsing virtual host from the file name Extracting virtual host from the file name Mar 2, 2017
@SeLLeRoNe
Copy link
Author

Thank you very much :) Now it showing better informations :)
There is one suggestion i would make which is, where the single file is showed (ex. Static Requests, Not Found URLs, Requested Files) would be better in a "multi-virtualhost" einvornment to have the full URL so it's easiest understand whic domain those requests/errors are referring to.

Beside that, now it's quite better :)

@allinurl
Copy link
Owner

allinurl commented Mar 2, 2017

Glad that worked.

Right now you can prepend the vhost to all requested files. Please take a look at the man page, Virtual Hosts examples at the bottom of the page.

#435 will add this natively.

Closing this, feel free to reopen it if needed.

@allinurl allinurl closed this as completed Mar 2, 2017
@SeLLeRoNe
Copy link
Author

Thanks, but actually i cannot find the information you're refering to in the man page, which section should i investigate deeper?
Also, i saw that is possible to start GoAccess as service to have live data, in that case i would i have to start it to parse mulitple files (or even better folders) adding the virtual host as i did for the static file generation?

Thanks again for your time and patience :)

@allinurl
Copy link
Owner

allinurl commented Mar 2, 2017

Assumes your log format looks like:

192.168.1.43 - - [01/Mar/2017:09:12:14 -0600] "GET /awesome.php?v1.0 HTTP/1.1" 200 5967 "-" "github-camo (ra2z6a90)"

then

awk '{n=split(FILENAME, a, "/"); sub(/\.log/, "", a[n]); print a[n], $0}' /var/log/httpd/domains/domain.com.log | awk '$8=$1" "$8' | goaccess --log-format='%v %h %^[%d:%t %^] "%r" %s %b "%R" "%u"' --date-format='%d/%b/%Y' --time-format=%T

That's right, you can have live data as the demo page. Upstream added the ability to parse multiple log files in real-time. In v1.1.1 you can only display live data for one log file. For instance, for live stats you can run it as:

goaccess -f access.log -o /usr/share/nginx/www/rt.goaccess.io/report.html --real-time-html --ws-url=your_server_host

Take a look the FAQ for info on live stats

@SeLLeRoNe
Copy link
Author

Ok, thnks, the first solution for the files list worked perfectly.

Too bad i cannot run the live server with multiple files but it's ok.

An additional useful thing maybe would be have the ability to filter by Virutalhost in order to see the stats specific for a single domain :)

For now, thanks :)

@allinurl
Copy link
Owner

allinurl commented Mar 2, 2017

#117 will add the filtering from the UI. Otherwise, the upcoming version will allow you to do live stats as:

tail -f access.log | grep -i --line-buffered 'firefox' | goaccess --log-format=COMBINED -

Stay tuned :)

@SeLLeRoNe
Copy link
Author

But still from a single file :) A folder parsing (like -d /var/log/httpd/domains/*.log) would be better to parse multiple logs at once and have very rich live data (but i guess would be also resource consuming).

Maybe the ability to get the info from a rsyslog MySQL DB would have on a centralized logging system environment aswell :)

@allinurl
Copy link
Owner

allinurl commented Mar 2, 2017

The upcoming version will allow you to do just that + live stats:

goaccess --log-format=COMBINED /var/log/httpd/domains/*.log

Once #117 is implemented, you should be able to filter for a particular vhost within the UI.

@SeLLeRoNe
Copy link
Author

Oh that's cool :)
Shall i suggest also the option to exlude filename that contant specific character?
For instance, in /var/log/httpd/domains/ i've got two different of logs, .error.log and .log
While .log is the normal access log which already give all the needed informations, the .error.log is just for more detailed errors about a 500 or php code and so on, so would be good to be able not to caputre those :)

Thanks

@allinurl
Copy link
Owner

allinurl commented Mar 2, 2017

You should be able to find specific files, e.g.,

find ./ -name "*" -and -not -name "error.log"

You may need to code something in bash and loop though the files.

I don't think goaccess should be doing file processing, especially since there are a bunch of tools that can handle this pretty well. Thanks for the suggestion though.

@SeLLeRoNe
Copy link
Author

Sure i definetely agree but on a service using goaccess live that option shouldn't be possible i think :)

allinurl added a commit that referenced this issue Sep 29, 2022
The command line option --fname-as-vhost allows users to specify a
POSIX regex and extract only part of the filename to use as a vhost.

e.g., --fname-as-vhost='[a-z]*\.[a-z]*' can be used to extract
awesome.com.log => awesome.com.

#2384
#673
@allinurl
Copy link
Owner

allinurl commented Sep 29, 2022

I was able to push out a commit that implements this. For flexibility, it uses POSIX regex to extract the vhost from the filename. Here's how it works. e.g.,

Assuming /path/awesome.com.log using --fname-as-vhost='.*' will extract awesome.com.log as vhost.
Assuming /path/awesome.com.log using --fname-as-vhost='.' will extract a as vhost.
Assuming /path/awesome.com.log using --fname-as-vhost='[a-z]*\.[a-z]*' will extract awesome.com as vhost.

You can use multiple log files and it will work the same the way (assumes the same pattern).

Feel free to build from development and let me know how it goes.

It will be deployed in the upcoming release. Stay tuned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants