This is simply an update that reduces the scope of API Proposal IV based on feedback from network participants and budget concerns. We also made adjustments other adjustments based on historical performance and responsiveness. These changes are being implemented immediately with our remaining existing budget.
The provided nodes will be load balanced, 1 to 2 load balancers may be used but at least 1 is assured. Queries are only assigned to active and healthy nodes, while unhealthy nodes are automatically eliminated and reintroduced once they regain their health. Each team maintains and is responsible for its own geographically distributed nodes.
Reporting & Endpoints
RPC Endpoint : https://rpc.secret.express/
LCD Endpoint : https://lcd.secret.express/
Traffic Report Server Statistics
Node status : https://status.secret.express/
Given the large data volume generated daily for the traffic report, only a partial report is available. Ensure to check the date range for comprehensive understanding. Additionally, the node status reporting application is still in development and may not go back as far as people would like, this will improve over time and is currently in a pretty stable state.
The teams involved and node budgets are Secret Saturn (16 nodes), Delta Flyer (16 nodes), Trivium (10 nodes), and Quiet Monkey Mind (10 nodes). Payments will be awarded based on the monthly provision of nodes only. In cases where a provider do not maintain a node, no payment will be made for unprovided nodes, and leftover funds will carry forward to the succeeding month.
The SLA we provide will ensure we promptly address issues in the cluster. If noticeable API availability issues arrise at any point due to capacity, we will discuss with the community to determine the best path forward that is respectful to market conditions and thus affordability.
With 4 teams this proposal will cost (52 nodes x 150$ x 3 months) $23,400 and provide 52 total API nodes to the community.
Just want to make the proposers aware on time of the vote they can expect from us going into the next API proposal period.
We have decided to vote No on any following API proposal unless:
- The # of nodes is again significantly reduced - We have seen from personal and external data that single Secret nodes can handle up to 3m queries a month without problems. This is very much dependent on the type of queries however so taking it as a sole truth would be stupid. However, This indicates the Api prop might be able to operate with as little as 10-20 nodes under current load. Therefore i think trialing a period with ~30 nodes could be a good way to better understand the needed coverage.
- Additional effort is placed into optimizing the load-balancing and Caching effort. This could include: Caching for often requested queries, Geo distribution of traffic, Faster autohealing via integrated loadbalancing techniques/Retries and more
- An effort is taken to synchronize the techstack and service across different participants. This is most clear from the difference in archive kept (which is often less than a day at current times) but there are other settings the team could optimize across all players to deliver a more consistent service.
- Delta/Moonstash is no longer a leading member of the team - We have reservations about his leadership style on the past API proposals, specifically around the choices that were made in the past revision and the way comments about the performance/redundancy of the proposal were handled. **this is not a comment on the team not doing additional testing as we can understand why they deemed that out of scope.
- A new leadership is chosen with more significant experience in running high uptime SLA API infrastructure. We think Consensus One Could be this party but are open to any suggestions from the API team or community.
We could also see an alternative solution arise where SCRT labs hires a dedicated team to deliver a Keplr endpoint (delegations or SCRT amount could probably both work). I expect cosmos to have many teams that could operate such a service for as little as $5000-7500 a month. I even think these teams could deliver a more consistent and reliable service than any crowdsourced community API can. With potential add-ons and rate limiting this could also be a good overflow-hub to use for Secret dApps, something the current express endpoint is not always able to do as far as i heard. **just to be clear i have not asked SCRT labs to do this nor can we mandate that but its an alternative i think worth exploring again if they are open to it.
Please don’t take this proposal as a mandate to change all of these items, other participants will have different opinions and just our vote will not make or break your proposal. But i do hope this feedback is helpful to improve this service for the network.
single Secret nodes can handle up to 3m queries a month without problems.
Just clarifying here - a single node can handle up to 3m queries per day with no problems.
An effort is taken to synchronize the techstack and service across different participants.
Here are a few examples:
- have consistent pruning settings
- have consistent tx_index settings (I’ve encountered several instances where I couldn’t query scrt express due to one node’s index being disabled)
- higher rate limit
- have all nodes support state sync
52 nodes implies insane utilization; Stargaze’s official node provider, for example, handles upwards of 300M hits per day using 3 nodes. It’s not an apples-to-apples comparison by any stretch of the imagination, but it does add a little perspective to the ask.
The transparency report shows 140M requests over 11 days. Rounded up, that’s 13M hits per day, or roughly
250,000 hits per node per day.
With that little utilization, I’d suggest reducing the node count by a factor of 3 or 4. 12-20 nodes rather than the current 52.
I would like to take a moment to provide some important clarifications and context regarding recent discussions, particularly in response to comments made by Lavender Five. So community is fully aware of the nuances surrounding our proposal:
Leadership Role: Alex is the lead on this proposal, and my role within it primarily pertains to the provision of essential tooling that I have developed in an uncompensated manner, load balancing, and nodes.
Consensus One: regarding the suggestion that they potentially lead, it should be noted Consensus One is not mentioned in the proposal. While I hold great respect for them, it appears that Lavender Five did not fully reviewed the proposal or prior uptime stats before making their suggestions.
Commitment to Count Reduction: I have consistently advocated for reducing the count. Furthermore, I have no objections to further reducing the count. In fact, we have already communicated our intention to conduct rolling assessments and remain open to further reductions. This is something L5 is aware of yet did not mention and continues to blame me for. Furthermore, I was never responsible for decisions around count. L5 was told various times it wasn’t my personal call, that I wanted the count lower, and yet they continue to say this knowing its false. That said, while it’s true other networks need fewer nodes, it’s historically been quite different for secret. We know this from first hand experience.
Professional Experience: Addressing concerns about my professional experience, in my primary role outside of crypto, I am responsible for ensuring the reliability of a greater number of machines than there are nodes deployed across the entire Secret network. Beyond that, I have significant experience in running high uptime SLA API infrastructure on secret and have implemented systems that have handled 10s of billions of requests over the years. Few others have such experience who would offer what I/we do at the steep discounts the API team offers.
Proven Reliability: Our auto-healing solution consistently outperforms others on the network. When endpoints from different providers, including Lavender Five, encounter issues attributed to our cluster, further investigation often reveals that the problem lies with other providers’ out-of-date nodes erroneously associated with our cluster. So, I’m just confused why we are being asked to do much of anything to change the top performing solution on the network. We do not have any complaints from users regarding the load balancer, and while some of the optimizations mentioned by Dylan seem fine, no one on the API team should be asked to operate at a loss related to time spent. Specifically, when there are no actual issues brought up by users.
I hope these clarifications provide a clearer perspective on my involvement and the proposal’s dynamics. I’m happy to provide my expertise, tooling, etc as long as the community allows.
To add to nodefathers answer which i wholeheartly agree with:
We‘ll definitely improve the consistency of the settings between the nodes in the cluster to ensure that all api users have a consistent experience when using it. Thanks for that valuable input.
In case for state sync I would advise to please DM me for direct IPs as statesyncing over loadbalanced endpoints is usually slower and less reliable.
will only post a small comment:
We ofcourse are aware that Consensus One is not active on the current revised proposal and have seen the uptime stats and reasoning for his departure.
I don’t think it is fair to compare the accessibility of our (single node, rate-limited) public endpoint to a paid loadbalanced solution nor is it the issue at hand.
Thank you for responding!
In case for state sync I would advise to please DM me for direct IPs as statesyncing over loadbalanced endpoints is usually slower and less reliable.
Yeah, that’s the issue I’d like to see addressed. Requiring direct IP address access creates a gatekeeping issue for new people joining the network. How will they know to DM you? It’d be better to simply have all nodes maintain state sync history.
I have no skin in the game here, I don’t know the history of what happened.
Scrt express is undeniably a better service for the network than anything that has existed previously. I am in no way, shape, or form suggesting the community members providing it haven’t delivered on what they promised: an effective, high-uptime endpoint. Nor am I suggesting it be disbanded, or for folks to vote NO on the proposal.
edit due to hitting enter too soon:
My previous comment was for ways to improve it in the future, and to make it leaner. I don’t see a benefit in there being 4 different entities representing scrt express.
IDK what you refer to specifically, nor do we have to go into it - but just in case it is about our potential participation I will clarify that:
Lavender.Five Nodes is not looking to participate as a provider for Secret.express and we will continue to deliver our public endpoint no matter the outcome of the upcoming API proposal.
To properly account for all traffic that hit the nodes and are not accounted for in the transparency log
I’ve decided to set my subcluster api endpoints (XXX.mainnet.secretsaturn.net) to use the secret.express LB.
That way we should get a more accurate reading of the complete traffic that hits the API nodes.
After discussing this with Ian, I want to give some insight in regards of the choice of not lowering the node count fast enough:
Ian, as a co-lead, advocated internally for a lower node count. I, also a co-lead, was adamant on keeping the same node count back then incase we have bigger ddos attacks or spamming and to be already prepared for higher api loads in the near future.
I apologize for this and I now see that we should have lowered it somewhat sooner.
As discussed in Secret Governance. Alex turned down an initial offer for this to be formalized by me in an on chain signal proposal, and I have since waffled on my stance and will not be doing business with Alex going forward.
As per my original comment that is now deleted.
“and if I have any stances outlined here that change I’ll make the time to post an update.”. Due to Alex choosing to reject my offer to formalize the change and other factors, I am merely making my promised updates in the thread.
We’ve reduced our cluster’s node count by 9 (effectively 10), implemented slightly ahead of schedule to identify any potential issues early. I anticipate no major problems regarding API availability and have already initiated some performance improvements to address any arising challenges. As always, we will ensure the API is operational for the duration of the budget, which is projected to be Feb or March 2024. Historical data can be accessed for further details as usual.
I maintain the internal stance that has not changed for several months, that stance being the count was inappropriate for the given traffic. Now the issues that prevented further reductions are no longer present, resulting in the 8/11/23 proposal 259 lasting 7-8 months total upon depletion of the budget which represents over 2x longer than it was originally budgeted for.
No. You still rugged API proposal.
It would be nice if you took being fired more gracefully like Mook did.
- 259 was not put up by you.
- You never controlled funds.
- The API proposal is still delivering on the terms.
- As Trivium pointed out, they were paid for 3 months which is what the proposals term was for, you were paid for that amount too.
- My post you reference where I “hand you everything” says I would communicate if I changed my stance, and I changed my stance.
You are saying rugged simply because you were fired, not because the terms of the proposal are not being met. And I get why… same reason why you disallowed lowering the node count for as long as you did, you were basing it off what you could personally afford as our internal discussions still show conversations around.
From discussion with API team about lowering node count.
Me saying I’d communicate if i changed my mind.
Alex lying about why he wouldn’t lower the node count.
- You rugged the Api prop funds
- Still rugged the funds
- You extended your own runway as well with it, as well as giving over lead to me.
- Still rugged, you’re just doing whatever you like with the funds
You still rugged the funds and the whole API proposal to try to keep yourself relevant in this ecosystem. Shameful behavior all around.
Let’s see what proposal 288 will bring.
1-5 no but i understand why you think that. I make zero profit from the API proposal, it exclusively covers the cost to provide public infrastructure and nothing more (~$20-$50). Nothing you say changes that.
As shown above… I have no reason to want to remain relevant providing a service I make nothing on. If I did make money, then you might have a fair arugment but alas you are literally the one who refused to lower the node count based on what you could afford (these messages from DMs and from the private room all still exist).